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Preface 



CIARP 2003 (8th Iberoamerican Congress on Pattern Recognition) was the eighth event 
in a series of pioneering congresses on pattern recognition in the Latin American com- 
munity of countries. This year, however, the forum was extended to include worldwide 
participation. The event has been held in the past in Mexico, Cuba, Brazil and Portu- 
gal; it took place this year in Havana (Cuba). The aim of the congress was to promote 
and disseminate ongoing research into mathematical methods for pattern recognition, 
computer vision, image analysis, and speech recognition, as well as the application of 
these techniques in such diverse areas as robotics, industry, health, entertainment, space 
exploration, telecommunications, data mining, document analysis, and natural language 
processing and recognition to name a few. Moreover it was a forum for scientific rese- 
arch, experience exchange, the sharing of new knowledge, and establishing contacts to 
improve cooperation between research groups in pattern recognition, computer vision 
and related areas. 

The congress was organized by the Institute of Cybernetics, Mathematics and Phy- 
sics of Cuba (ICIMAF) and the Center for Computing Research (CIC) of the National 
Polytechnic Institute of Mexico, and was sponsored by the University of La Salle, Me- 
xico, the University of Oriente, Cuba, the Polytechnic Institute “Jose A. Echevarria,” 
Cuba, the Central University of Las Villas, Cuba, the National Center of Scientific Re- 
search, Cuba, the Cuban Association for Pattern Recognition (ACRP), the Portuguese 
Association for Pattern Recognition (APRP), the Spanish Association for Pattern Reco- 
gnition and Image Analysis (AERFAI), the Mexican Society for Artihcial Intelligence 
(SMIA), and the International Association for Pattern Recognition (lAPR). 

This year the event captured the attention of a signihcant group of researchers who 
contributed with over 140 full papers from 19 countries. Out of these, 82 papers were 
accepted as full papers and are included in this proceedings volume, and 28 papers were 
accepted as poster presentations. The review process was carried out by the Program 
Committee, each paper being assessed by at least two reviewers who, in conjunction 
with other reviewers, prepared an excellent selection dealing with ongoing research. We 
are especially indebted to them for their efforts and the quality of the reviews. 

Three professors were invited to give keynote addresses on topics in computer vision, 
robot vision and pattern classification: Dr. Rangachar Kasturi, lAPR President, and a 
professor in the Department of Computer Science and Engineering at Pennsylvania State 
University, Dr. Gerhard Ritter, Chairman of the Computer and Information Science and 
Engineering Department at the University of Florida, and Dr. Alberto Sanfeliu, past 
President of AERFAI, and a professor at the Technical University of Catalonia. 

We appreciate very much all the intense work done by the members of the organizing 
committee that allowed for an excellent conference and proceedings. We hope that this 
congress was a fruitful precedent for future CIARP events. 
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Abstract. Computation in a neuron of a traditional neural network is 
accomplished by summing the products of neural values and connection 
weights of all the neurons in the network connected to it. The new state 
of the neuron is then obtained by an activation function which sets the 
state to either zero or one, depending on the computed value. We provide 
an alternative way of computation in an artificial neuron based on lattice 
algebra and dendritic computation. The neurons of the proposed model 
bear a close resemblance to the morphology of biological neurons and 
mimic some of their behavior. The computational and pattern recog- 
nition capabilities of this model are explored by means of illustrative 
examples and detailed discussion. 



1 Introduction 

Various artificial neural networks (ANNs) that are currently in vogue, such as 
radial basis function neural networks and support vector machines, have very 
little in common with actual biological neural networks. A major aim of this 
paper is to introduce a model of an artificial neuron that bears a closer re- 
semblance to neurons of the cerebral cortex than those found in the current 
literature. We will show that this model has greater computational capability 
and pattern discrimination power than single neurons found in current ANNs. 
Since our model mimics various biological processes, it will be useful to provide 
a brief background of the morphology of a biological neuron. 

A typical neuron of the mammalian brain has two processes called, respec- 
tively, dendrites and axons. The axon is the principal fiber that forms toward its 
ends a multitude of branches, called the axonal tree. The tips of these branches, 
called nerve terminals or synaptic knobs, make contact with the dendritic struc- 
tures of other neurons. These sites of contact are called synaptic sites. The 
synaptic sites of dendrites are the places where synapses take place. Dendrites 
have many branches that create large and complicated trees and the number 
of synapses on a single neuron of the cortex typically ranges between 500 and 
200,000. Figure 1 provides a simplified sketch of the processes of a biological 
neuron. It is also well-known that there exist two types of synapses; excitatory 
synapses that play a role in exciting the postsynaptic cell to fire impulses, and 

A. Sanfeliu and J. Ruiz-Shulcloper (Eds.): CIARP 2003, LNCS 2905, pp. 1-16, 2003. 
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Fig. 1. Simplified sketch of the processes of a biological neuron 



inhibitory synapses that try to prevent the neuron from firing impulses in re- 
sponse to excitatory synapses. The postsynaptic membranes of the dendrites will 
thus either accept or inhibit the received input from other neurons. 

It is worthwhile to note that dendrites make up the largest component in 
both surface area and volume of the brain. Part of this is due to the fact that 
dendrites span all cortical layers in all regions of the cerebral cortex [1-3]. Thus, 
when attempting to model artificial brain networks, one cannot ignore dendrites, 
which make up more than 50% of the neuron’s membrane. This is especially 
true in light of the fact that some researchers have proposed that dendrites, and 
not the neurons, are the elementary computing devices of the brain, capable of 
implementing such logical functions as AND, OR, and NOT [1-9]. 

Current ANN models, and in particular perceptrons, do not include dendritic 
structures. As a result, problems occur that may be easily preventable when in- 
cluding dendritic computing. For example, M. Gori and F. Scarselli have shown 
that multilayer perceptrons (MLPs) are not adequate for pattern recognition 
and verification [10]. Specifically, they proved that multilayer perceptrons with 
sigmoidal units and a number of hidden units less than or equal to the number 
of input units, are unable to model patterns distributed in typical clusters. The 
reason is that these networks draw open separation surfaces in pattern space. 
In this case, all patterns not members of the cluster but contained in an open 
area determined by the separation surfaces will be misclassified. This situation 
is depicted in Fig. 2. When using more hidden units than input units, the sep- 
aration may result closed but, unfortunately, determining whether or not the 
perceptron draws closed separation surfaces in pattern space is NP-hard. This 
is quite opposite to what is commonly believed and reported in the literature. 
The network model described in this paper does not suffer from these problems. 

Gori’s and Scarselli’s result was one reason for trying to use lattice algebra 
operations in perceptrons. Another reason is that lattice operations have proven 
quite successful in the area of associative memories as well as some pattern clas- 
sification tasks [11-22]. Earlier attempts at morphological perceptrons did not 
include the notion of dendritic computing and were restricted to two-class prob- 
lems. The lack of dendrites required hidden layers and computationally intensive 
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Fig. 2. In a trained MLP, the separation surface (dotted) between two clusters (black 
and white circles) may result open and include impostor patterns (triangles) (a). A 
closed surface avoids this problem and is desired (b) 



algorithms that did not provide for easy generalization to multiclass pattern sep- 
aration. In contrast, the model defined in this paper uses dendritic computation, 
requires no hidden layers, is capable of multi-class separation to within any de- 
sired degree of accuracy, has the ability to produce closed separation surfaces 
between pattern clusters, and generalizes to fuzzy pattern recognition. 

2 Morphological Perceptrons Based on Dendritic 
Computation 

Let Ni, . . . , Nn denote a collection of neurons with morphology as shown in 
Fig. 1. Suppose these neurons provide synaptic input to another collection Mi, 

. . . , Mm of neurons also having processes as depicted in Fig. 1. The value of a 
neuron Ni (f = 1, . . . , n) propagates through its axonal tree all the way to the 
terminal branches that make contact with the neuron Mj (j = 1, . . . ,m). The 
weight of an axonal branch of neuron Ni terminating on the fcth dendrite of 
Mj is denoted by w^jf., where the superscript i € {0,1} distinguishes between 
excitatory (£ = 1) and inhibitory (I = 0) input to the dendrite. The fcth dendrite 
of Mj will respond to the total input received from the neurons TVi, . . . , Nn and 
will either accept or inhibit the received input. The computation of the fcth 
dendrite of Mj is given by 

A A ( ^ Xi + wljk) , (1) 

i^I(k) iGL(i) 

where x = (si, . . . , Xn) denotes the input value of the neurons iVi, . . . , Nn with 
Xi representing the value of Ni, I{k) C {l,...,n| corresponds to the set of 
all input neurons with terminal fibers that synapse on the fcth dendrite of Mj, 
L{i) Q {0, 1} corresponds to the set of terminal fibers of Ni that synapse on 
the fcth dendrite of Mj, and pjk G { 1,1} denotes the excitatory {pjk = 1) or 
inhibitory (jpjk = 1) response of the fcth dendrite of Mj to the received input. 

It follows from the formulation L{i) C {0, 1} that the ith neuron Ni can have 
at most two synapses on a given dendrite fc. Also, if the value £ = 1, then the 
input {xi + wljf.) is excitatory, and inhibitory for £ = 0 since in this case we have 
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Fig. 3. Morphological perceptron with dendritic structure. Terminations of excitatory 
and inhibitory fibers are marked with • and o, respectively. Symbol Djk denotes den- 
drite k of Mj and Kj its number of dendrites. Neuron Ni can synapse Djk with exci- 
tatory or inhibitory hbers, e.g. weights wljk and respectively denote excitatory 

and inhibitory fibers from Ni to Djk and from N„ to Dj 2 



The value (x) is passed to the cell body and the state of Mj is a function 
of the input received from all its dendrites. The total value received by Mj is 
given by 

T^(x) =pj f\ r^(x) , (2) 

k=l 



where Kj denotes the total number of dendrites of Mj and pj = ±1 denotes 
the response of the cell body to the received dendritic input. Here again, pj = 1 
means that the input is accepted, while pj = 1 means that the cell rejects 

the received input. The next state of Mj is then determined by an activation 
function /, namely pj = / r-^(x)). In this exposition we restrict our discussion 
to the hard-limiter 



r 1 if r-^(x) > 0 
0 if (x) < 0 



( 3 ) 



unless otherwise stated. The total computation of Mj is, therefore, given by 





K, 
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\1 


%(x) = / 


po A 


pjk A 


A ( ^ Xi+wfjk) 




k^l 




ieL{i) J _ 



Figure 3 provides a graphical representation of this model. 

A single layer morphological perceptron (SLMP) is a special case of this 
model. Here the neurons Ni, . . . , would denote the input neurons and the neu- 
rons Ml , . . . , Mm the output neurons. For SLMPs we allow x = {x \, . . . , Xn) S 
M”. That is, the value Xi of the ith input neuron Ni need not be binary. 
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Fig. 4. The output neuron M will fire {y = 1) for input values from the interval [a, &]; 
if a; £ IR \ [a, fe], then y — 0 

3 Examples 

Having defined the computational model of dendritic processes and SLMPs, it 
will be instructive to provide a few examples in order to illustrate the computa- 
tional capabilities of the proposed model. 

Example 1. The simplest case occurs when the SLMP consists of just one input 
neuron N , and one output neuron M with a single dendrite having an excitatory 
(Pjk = 1) response to the received input. Here the notation can be simplified 
by discarding subscripts i, j and k (as being all 1). Figure 4 illustrates such 
an SLMP. If the response of the cell body p = 1, then by (1) and (2) we have 
r(a;) = {x a) A {x b), and t{x) > 0 4=^ x > a and x < b. Hence, the 
output neuron M will fire {y = f(r{x)) = 1) when x S [a, b]. 

Example 2. The SLMP depicted in Fig. 5 also consists of one input neuron 
and one output neuron, but this time the output neuron has three dendrites, 
Di, D 2 and D 3 . The corresponding network parameters are given in Table 1. 
For algebraic consistency as well as numerical computation when using (1) and 
(4), unused terminal fibers with a hypothetical excitatory or inhibitory input 
will be assigned a weight of -l-oo or 00 , respectively. If a < 6 < c < d, by 
substituting the values of the synaptic weights in ( 1 ) and ( 2 ), we obtain r(x) = 
[(x a)] A [(x a) A {x 6 )] A [(x c) A {x d)]. The output neuron M 

will fire when t{x) > 0 4=^ x G {a} U [b, c] U [d, 00 ), as depicted on the axis at 
the bottom of Fig. 5. 

Example 3. An SLMP with two input neurons, N\ and N 2 , and two output 
neurons. Mi and M 2 , can be used to solve the XOR problem, formulated as a 
two-class problem. Figure 6 illustrates such an SLMP. If y = (?/i,j/ 2 ), where yj 
denotes the output signal of neuron Mj (j = 1 , 2 ), then the desired network 
output is: 




(1, 0) if X G Cl 
(0, 1) if X G C 2 
(0, 0) if X G \ (Cl u C 2 ) . 



( 5 ) 
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Fig. 5. The output neuron M will fire {y = 1) for input values from the set X = 
{a} U [&, c] U [d, oo); if a; e R \ X, then y = 0 

Table 1. Weights and Synaptic Responses, Ex. 2 
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As in classical perceptron theory, solving this problem requires two output 
neurons. However, in contrast to the classical model, no hidden layer is necessary 
for the morphological perceptron to solve the problem. 

In this case, y = / t^(x)) , / r^(x))), where 'r-^(x) = Aa^i denotes 

the computation performed by Mj, and Kj denotes the number of dendrites of 
Mj . The values of the axonal branch weights and output responses pjk are 
specified in Table 2. 



4 Computational Capability of an SLMP 

Analogous to the classical single layer perceptron (SLP) with one output neuron, 
a single layer morphological perceptron (SLMP) with one output neuron also 
consists of a finite number of input neurons that are connected via axonal fibers 
to the output neuron. However, in contrast to an SLP, the output neuron of an 
SLMP has a dendritic structure and performs the lattice computation embodied 
by (4). Figure 3 provides a pictorial representation of a general SLMP with a 
single output neuron. As the examples of the preceding section illustrate, the 
computational capability of an SLMP is vastly different from that of an SLP as 
well as that of classical perceptrons in general. No hidden layers were necessary 
to solve the XOR problem or to specify the points of the non-convex region of 
Fig. 7. Observing differences by examples, however, does not provide answer as 
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Fig. 6. SLMP that solves the two-class XOR problem for points from the domain 
X = (xi, X 2 ) € IR^ 

Table 2. Two-Class XOR Network Parameters, Ex. 3 
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to the specific computational capabilities of an SLMP with one output neuron. 
Such an answer is given by the following two theorems. 

Theorem 1. If X G R” is compact and e > 0, then there exists a single layer 
morphological perceptron that assigns every point of X to class C\ and every 
point X € R" to class Cq whenever d(x,X) > e. 

The expression d(x,X) in Theorem 1 refers to the distance of the point x € 
R" to the set X. Figure 7 illustrates this concept. All points of X will be classified 
as belonging to class C\ and all points outside the banded region of thickness e 
will be classified as belonging to class Cq. Points within the banded region may 
be misclassified. As a consequence, any compact configuration, whether it is 
convex or non-convex, connected or not connected, contains a finite or infinite 
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Fig. 7. The compact region X (shaded) and the banded region of thickness e (dashed) 



number of points, can be approximated to any desired degree of accuracy e > 0 
by an SLMP with one output neuron. 

The proof of Theorem 1 requires tools from elementary point set topology and 
is given in [23] . Although the proof is an existence proof, part of it is constructive 
and provides the basic idea for our training algorithms. 

Theorem 2 is a generalization of Theorem 1 to multiple sets. Suppose Xi, 
X 2 , . . . , Xm denotes a collection of disjoint compact subsets of M". The goal 
is to classify, Vj = 1, . . . , m, every point of Xj as a point belonging to class Cj 
and not belonging to class Ci whenever i ^ j. For each p G {!,..., m}, define 

= U^i j^p ^j- Since each Yp is compact and YpCXp = %,Sp = d(Ap, Yp) > 0 
Vp = 1, . . . , m. Let £q = i min{£i, . . . , £p}. 

Theorem 2. // {Xi, X 2 , . . . , A^} is a collection of disjoint subsets of MX and 
£ a positive number with e < Sq, then there exists a single layer morphological 
perceptron that assigns each point x G R" to class Cj whenever x G Xj and j G 
{!,..., to}, and to class Cq = ^ UjLi whenever d(x, Xi) > e, Vi = 1, . . . , to. 
Furthermore, no point x G K" is assigned to more than one class. 

Figure 8 illustrates the conclusion of Theorem 2 for the case to = 3. The proof 
of this theorem is somewhat lengthy and because of page limitation could not be 
included. The proof is given in [24] . Based on the proofs of these two theorems, we 
constructed training algorithms for SLMPs [23,24]. During the learning phase, 
the output neurons grow new dendrites and input neurons expand their axonal 
branches to terminate on the new dendrites. The algorithms always converge 
and have rapid convergence rate when compared to backpropagation learning in 
traditional perceptrons. 

These training algorithms are similar in that they all dynamically grow den- 
drites and axonal fibers during the learning phase, which will use the patterns 
of the training set in just one iteration (one epoch) . The algorithms differ in the 
strategy of partitioning the pattern space, by either growing a class region by 
merging smaller hyper-boxes, or reducing an initial large box through elimination 
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Fig. 8. The compact region X (shaded) and the banded region of thickness e (dashed) 



of foreign patterns and smaller regions that enclose them. Also, as a consequence 
of the aforementioned theorems on which the algorithms are based, the trained 
SLMPs will always correctly recognize 100% of the patterns in the training set. 
The next examples illustrate the performance of these training algorithms for 
some well-known problems. 

Example 4- A nontrivial benchmark for testing the performance of a training 
algorithm is the well known problem of the two intertwined spirals [25]. For 
our tests, we used two Archimedean spirals defined by the complex expression 
Zc{0) = Xc{0) + ia ^yc{0) = ( 1)^ where i = \J 1, c G {0, 1} denotes 

the spiral class label, 9 is the angle in radians, a > 0 denotes the aspect ratio 
between the Xc and j/c coordinates, and p > 0 is a constant that controls the 
spread of the spiral turns. 

This problem was used to test the performance of one SLMP training algo- 
rithm that we developed, which is tailored to handle data sets where the patterns 
are points on a curve in 2-D space. The data set consists of 192 patterns, 96 on 
each spiral, 75% of which were used for training and 25% for test, selected at 
random from the entire set. During a typical training session, the algorithm 
grew 163 dendrites (161 excitatory and 2 inhibitory). Recognition of the test 
patterns was 100% correct. The class Ci region learned by the SLMP is illus- 
trated in Fig. 9. In the figure, each small rectangle represents an elementary 
area recognized by an individual dendrite. The solid-line rectangles correspond 
to excitatory dendrites; regions of inhibitory dendrites are drawn with dashed 
lines. The learned class Ci region is the union of the solid-line rectangles minus 
the dashed-line rectangles. 

Example 5. Another data set we used is the one considered in [26] to test their 
simulation of a radial basis function network. The data set consists of two non- 
linearly separable classes of 10 patterns each. This pattern set was used as input 
by two other training algorithms that we developed. The former algorithm uses 
region merging, but assumes no a priori distribution of the patterns as did the 
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Fig. 9. The two-spirals problem. Each spiral consists of 96 patterns, 75% of which were 
used for training. Class Ci patterns are marked with filled circles (training) and circled 
dots (test); class C 2 patterns with empty circles (training) and double circles (test). 
The shaded area is the learned class C\ region; recognition is 100% correct 



method mentioned in Example 4. The latter algorithm uses region elimination, 
i.e. it starts training by drawing an enclosing large hyper-box and proceeds by 
eliminating smaller regions around patterns that do not belong to the class corre- 
sponding to the enclosing hyper-box. The results of the two training algorithms 
are illustrated in Fig. 10 and 11, respectively. In both cases, all patterns were 
used for both training and test, and classification was 100% correct, as expected. 

For comparison. Fig. 12 depicts the results after training a classical MLP on 
the same data set. The dotted lines represent the decision boundaries learned 
after 2000 epochs by a two-layer MLP with 13 nodes in the hidden layer us- 
ing backpropagation as training algorithm. Figure 12 shows that the separation 
surfaces learned by the MLP are open, in contrast to the separation surfaces of 
SLMPs, which are guaranteed to result closed. Furthermore, convergence of the 
MLP is much slower than the SLMP’s counterpart, even for this small data set 
of 20 patterns. 



5 Remarks on Fuzzy Computing and Inhibitory Neurons 

In our SLMP model the values of the output neurons are always crisp, i.e. having 
either value 1 or 0. In many application domains it is often desirable to have 
fuzzy valued outputs in order to describe such terms as very tall, tall, fairly 
tall, somewhat tall, and not tall at all. Obviously, the boundaries between these 
concepts cannot be exactly quantified. In particular, we would like to have output 
values ?/j(x) such that 0 < yj(x) < 1, where ?/j(x) = 1 if x is a clear member 



Neurons, Dendrites, and Pattern Classification 1 1 




Fig. 10. The shaded are is the class Ci region learned by the merging version of the 
SLMP training algorithm applied to this two nonlinearly separable classes problem. 
Patterns of the two classes are marked with • and o, respectively. The algorithms grows 
20 dendrites (19 excitatory and 1 inhibitory, dashed); recognition is 100% correct 



of class Cj and j/j(x) = 0 whenever x has no relation to class Cj. However, we 
would like to say that x is close to full membership of class Cj the closer the 
value of yj{x) is to value 1. To illustrate how the SLMP can be extended to 
produce fuzzy outputs, we reconsider Example 1 of Section 3. 

Suppose we would like to have every point in the interval [a, 6] C IR to be 
classified as belonging to class C\ and every point outside the interval [a a, b+a] 
as having no relation to class Ci, where a > 0 is a specified fuzzy boundary 
parameter. For a point x G [a a,a] or x G [b, b+a] we would like y{x) to be 
close to 1 when x is close to a or b, and y{x) close to 0 whenever x is close to a a 
or b + a. In this case we simply convert the input a; S IR to a new input format 
If = b and w{ = a denote the weights found either by inspection or the 

aforementioned algorithms for input x, then set xj = ^ 1 and v\ = ^ + 1 

for the weights of the new input ^ and use the ramp activation function 



f{z) = 



1 if z > 1 
^ if 0 < z < 1 
0 if z < 0 



( 6 ) 



Computing r f) we obtain r f ) = ^ + vl) A ^ + v'l) = [^{x a) + l] 
a[ 6) + 1]. Thus, 
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if X € 
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= <^ 0 < T f ) < 1 if X ^ 
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if X ^ 


[a 



a,b + a] 



( 7 ) 



The Equation (7) is illustrated in Fig. 13 and the network in Fig. 14. By 
choosing fuzzy factors ai for each Xi, it is intuitively clear how this example 
generalizes to pattern vectors x = (xi, . . . , x„) G IR". 
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Fig. 11. The same problem as in Fig. 10, this time solved using the elimination version 
of the SLMP training algorithm. Only 4 dendrites of the class Gi output neuron are 
sufficient to partition the pattern space similarly to the partitioning learned by a 13- 
hidden unit MLP, but with closed surfaces. Recognition is again 100% correct 



One assumption made in our model is that a neuron Ni can provide both 
excitatory as well as inhibitory input to a neuron Mj. This assumption has no 
foundation in biology. In real neural networks, a neuron can send only excitatory 
or only inhibitory signals to other neurons. Neurons that act as inhibitors on 
other neurons are called inhibitory neurons. It is interesting to observe that all 
the examples given in this exposition can be expressed in terms of networks 
consisting entirely of excitatory and inhibitory neurons. As an illustration, let 
us again consider Example 1 of Section 3. In this case there are several ways of 
adding an inhibitory neuron. For example, we can add an inhibitory neuron N 2 so 
that we now have two input neurons N\ and N 2 , one sending only excitatory and 
the other only inhibitory inputs to the output neuron M, as shown in Fig. 15(a). 
In this case the axonal weights of the excitatory and inhibitory neuron are wl = 
a and Wj = 6, respectively. Obviously, t{x) = {x a) A {x 6) > 0 4=^ a < 

X < b and, therefore, the output of M is 1 if and only if a; S [a, 6]. The downside 
of this approach is that the network topology has become a bit more complex 
in that we are now dealing with two input neurons. If only one input neuron is 
desired, then N 2 can be partially hidden as illustrated in Fig. 15(b). In this case, 
Ni sends excitatory signals to M and N 2 , and N 2 sends inhibitory signals to M. 
Since N 2 is not an input neuron, its states are binary, the activation function 
for N 2 is a hard limiter of form 






1 if z > 0 
0 if z < 0 



( 8 ) 



and its axonal weight is w® = -5. The weights of the input neuron’s dendritic 

fibers are ® '^21 = ^1 which terminate on the single dendrites of 

N 2 and M, respectively. If t{x) and r^(x) denote the total received inputs of M 
and N 2 , then M fires if and only if r(x) = (x a) A \g t^(x)) 0.5] > 0. Thus, 
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Fig. 12. Decision boundaries learned by a two-layer perceptron with 13 nodes in the 
hidden layer, using backpropagation. The thin space between the boundaries represents 
a region of uncertainty. Note that the separation surfaces are open, and compare with 
the regions learned by an SLMP in Fig. 10 and 11 




Fig. 13. Illustration of computing fuzzy output values 



the output of M has value 1 if and only if a < a; < 6. Although this network also 
solves the problem of having only excitatory and inhibitory neurons, it is again 
more complex than the two-neuron model of Example 1. Even considering more 
complex pattern recognition problems, we have seen no mathematical advantage 
thus far in using inhibitory neurons. This does not imply that future research will 
not discover more powerful neural networks with dendritic structures consisting 
of only excitatory and inhibitory neurons. This facet of our investigations remains 
an active area of research. 

6 Conclusions 

We presented a new paradigm for neural computation that is based on lattice 
algebra and takes into account synaptic responses as well as computations in 
dendrites. The training algorithms that we developed grow new axonal fibers 
as well as dendritic structures during the learning phase. These facets of our 
model are more in agreement with current understanding of cerebral neural 
networks than current fashionable ANNs. The theorems that we established as 
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Fig. 14. The modified network of Example 1 




Fig. 15. Examples of two different SLMPs using only excitatory and inhibitory neu- 
rons. In (a) two input neurons are required, while in (b) a semi-hidden neuron is 
required 



well as the examples presented in this paper make it obvious that an SLMP with 
just one output neuron is far more powerful as a pattern recognizer than the 
traditional single layer perceptron with one output neuron or a perceptron with 
one output neuron and one hidden layer. In fact, our training algorithms always 
draw a closed surface around the training set, thus preventing the problems of 
traditional perceptrons discussed in the Introduction. 

We also indicated how our model can be generalized to include fuzzy compu- 
tation. This remains an area of further research and applications. Additionally, 
we discussed the problem of employing only excitatory and inhibitory neurons 
in our model. As mentioned, this remains an active area of research and we hope 
that other researchers will join us in further exploration of these problems in 
order to bring ANNs into closer relationship with biological neural networks. 
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Abstract. In this paper we present a summary of some of the research that we 
are developing in the Institute of Robotics of the CSIC-UPC, in the field of 
Learning and Robot Vision for autonomous mobile robots. We describe the 
problems that we have found and some solutions that have been applied in two 
issues: tracking objects and learning and recognition of 3D objects in robotic 
environments. We will explain some of the results accomplished. 



1 Introduction 

Computer vision in autonomous mobile robotics is a very well known topic that is 
being treated by many research groups [1]. However, the use of perception techniques 
to automatically learn and recognize the environment and the objects located on it is 
probably not so well known. One part of our research has concentrated in the 
development of techniques to capture and process the information that surrounds a 
robot, taking into account that this information can be captured by diverse perception 
sensors (colour video cameras, stereo vision, laser telemeter, ultrasonic sensors, etc.) 
and the sensors related to robot movement (odometers). 

We have focused our research in the development of “robust” techniques that must be 
as much as possible, “invariant” to illumination, colour, surface reflectance, sensor 
uncertainty, dead reckoning and dynamic environments. However, this wish is not 
always possible. We also orient our research to develop techniques to learn the 
perceptive world, in order to create a data base that can be used later on, by robots. 

In this paper we describe our research in two topics in the field: adaptive learning and 
tracking of moving objects; and learning and recognition of 3D objects of the 
environment. Although these topics lead to different techniques and methodologies, 
they share the same perception information formats, colour images and depth maps. 

However, we also use other kind of perception information formats which are 
captured by means of stereo vision, laser telemeter, ultrasonic and odometer sensors. 
The diverse information captured by these sensors is combined to obtain redundancy 
in order to improve the robustness of the techniques. 
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Before explaining the methods, we will start describing the typical “problems” that 
we find in the perception of a dynamic environment where the robot or the objects are 
moving. 



2 Common Problem s on the Acquisition of Perception 
Information Based on Colour Images and Depth Maps 

Colour represents a visual feature commonly used for object detection and tracking 
systems, especially in the field of human-computer interaction. When the 
environment is relatively simple, with controlled lighting conditions and an 
uncluttered background, colour can be considered a robust cue. The problem appears 
when we are dealing with scenes with varying illumination conditions and varying 
camera position and confusing background. 

The colour of an object surface can be modified by several circumstances, which 
limits the applicability of the use of colour images in robot vision. The following 
issues modify the colour perception of an object surface: 

the type of the illumination source, the illumination orientation, the number 
and distribution of the sources of illumination, 
the surface reflectance and the surface orientation, 
the texture of the surface, 

and the shadows produced by other objects or by the own concavities of the 
object. 

Some of these problems can be diminished in static scenes, by controlling the 
illumination (for example, for indoor robot environments, the type and position of the 
illumination), the object surfaces (for example, by choosing objects with Labertian 
surfaces) or the type of objects (for example, by using convex objects). 




Fig. 1. Typical reflectance problems of a colour (red) planar surface: (a) a sequence of a red 
planar surface; (b) RGB map of the colour distribution of the sequence of the planar surface. 
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However, although we can impose some of these constraints in indoor environments, 
still many of the aforementioned drawbacks persist, due to the relative position and 
orientation of the robot sensors, the illumination devices and the scene objects. This 
relative position not only involves the passive sensor (colour camera), but also the 
illumination sources, for example, the robot can interfere with the illumination of an 
object surface by means of its shadow cast or a new “virtual” illumination source 
appears due to the reflection of another surface. A typical example of the last case is 
the reflectance of the “ground”. 

Other typical problems are due to the camera sensor, for example, the optical 
aberration and geometrical deformation, the separation of the channel colour bands, 
the colour sensibility, the sensor sensibility to the illumination, the problems 
associated with the shutter speed or the resolution of the camera. 




Fig. 2. Some problems with the reflectance of the ground 



With respect to the capture of depth information, we have also other drawbacks. In 
the case of a laser telemeter, the sensor drawbacks are due to the features of laser 
source, the resolution or the speed of depth acquisition and processing, or the 
problems related to partial surface occlusion. If the depth sensors (for example, stereo 
vision or laser telemeter) are in the mobile robot, then other problems come around. 
For example, the relative position and orientation of the sensors with respect to the 
scene, because of the “skew” of the elevation, pitch or roll of cameras and laser 
telemeter with respect to ground. 

Additionally to the abovementioned problems, we always find that in robot 
perception, the uncertainty is an important issue that must be taken into account when 
discerning from sensory data the objects and the limits of the robot environment. This 
perception uncertainty must be incorporated in the models for robot navigation, object 
tracking, object recognition and landmark identification. 
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3 Developing “Robust Techniques” for Object and Face Tracking 

Object and face tracking are two typical problems in robot vision which have been 
studied using different types of sensors and techniques [2, 3]. One of the most used 
sensors is the colour video camera, since it provides enough information to follow an 
object and avoid uncertainties. However, in a real unconstrained environment, the 
varying illumination conditions, camera position and background create important 
problems to the robot tracker. Different approaches have been presented to solve these 
problems in robot tracking, but still this is an open problem from the point of view of 
robustness. 

The important challenge in colour tracking is the ability to accommodate to the 
variations of the illumination and the environment, that is, the tracker must modify its 
parameters depending on the circumstances. However, the use of independent 
adaptive techniques, many times, is not enough to cope with the problem, since the 
adaptation only takes into account one of the potential variations, for example the 
colour reflectance, however the variations are usually multivariate. For this reason, 
we have studied solutions that combine different techniques to take into account the 
multivariable effect. 

One of our first approaches combines information of colour changes and depth for 
face tracking in real time [4]. The purpose is to follow a face or an object that has 
colour and depth continuity avoiding the loss of them due to the presence of similar 
colour in the background. The technique fuses colour adaptation and stereo vision, in 
such a way, that the tracked objects only is analysed in a surface with similar depth 
information. The technique uses an ellipse to model the face of a person similar to the 
work of Birchfield [5] and adaptive colour models, for example [21]. The face is 
adapted by means of intensity gradients and colour histograms, and the stereo vision 
information dynamically adapts the size of the tracked elliptical face. The system uses 
the Kalman filter to predict the new position of the cameras and robot, and it runs at 
30 Hz, that is, in real time. 

A second approach, [6] tries to solve two important problems in object tracking: the 
change of the colour and the confusing background. As it was mentioned before, the 
colour of an object surface changes with the orientation of the surface (in principle 
only the intensity, but due to the illumination conditions and surface reflectance, the 
colour can also change). Moreover, if the background is confusing, then the tracking 
of an object surface becomes very difficult. In order to solve these two problems, we 
propose a solution based on fusing colour adaptation with shape adaptation. We have 
developed a method that, by using the CONDENSATION technique [7], combines 
the use of colour histograms adaptation with snake shape adaptation [8]. The 
algorithm formulates multiple hypotheses about the estimate of the colour distribution 
in the RGB space, and validates them taking into account the contour shape of the 
object. This combination produces a very robust technique whose results can be seen 
in Fig. 4. The technique is described in detail in [6]. 
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Fig. 3. The tracking vision system 




Fig. 4. Four experiments: (1) tracking of circles that change the colour; (2) tracking an object 
surface with different orientations and illumination; (3) tracking an insect in real environment; 
(4) tracking a snail in real environment 



4 Learning and Identifying of Objects in Mobile Robotic 
Environments 

The process of learning and identifying new 3D objects in robot environments has 
been treated using different methodologies, for example [9] [20], however these 
techniques only work for very constrained environments. Unfortunately, many of the 
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proposed methods fail in real unstructured environments, due to problems of 
illumination, shadows, object and camera position, or confusing background. 

In order to overcome some of these problems we are designing our methods taken 
into account the following criteria: 

the perception features must be, as much as possible, robust and relative 
invariant to changes of the environment, 

the representation models must be flexible and must include the statistical 
variations of the structure and of the perception features that are intrinsic in 
the learning process, 

the recognition, or matching, process must be robust against local variations 
and have to take into account the problems derived of partial occlusion, 
in the recognition process, the matching must be guided to reduce the number 
of potential model candidates. 

The first criteria is one of the most difficult to solve, since the perception features 
depend too much of uncontrolled environment conditions. For these reason we have 
selected as basic perception features, the surface colour and surface shape. The first 
one can be obtained from colour images and the second one from depth sensors (for 
example, stereo vision and laser telemeter). The invariance of surface colour is a 
difficult task, but we are diminishing its influence by using colour constancy methods 
and statistical information of the feature variations. However, colour constancy 
algorithms are not yet given us the results that we expect, although our new 
developments are promising [22]. In the other hand, the surface shape obtained from 
the depth sensors is a robust feature. 

One of the preliminary works to obtain robust results was the fusion of colour 
segmentation and depth, to improve the segmentation results. The method [23] 
processes independently colour segmentation and depth map, and then combines both 
outcomes. The idea of the method is to balance the over-segmentation and under- 
segmentation, by joining or splitting the singular areas. Fig. 5 shows the results of this 
method in a colour scene. 







Fig. 5. Fusion of colour segmentation and depth map to segment a colour scene 
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In the rest of this section, we will describe the solutions adopted for representation 
models, the recognition and the learning processes. The basic representation models 
that we are using are structural representations, chain of symbols and graphs. In the 
first case we use cocircuits (of the matroid theory) and in the last case, we use random 
graphs which combine structural information with statistical information of the 
attributes of the nodes and arcs. In this way, we have a representation model that can 
be learned directly from the colour images taken into account the potential variations 
of the perception features. 

Our research group has developed several methods to learn and recognise 3D objects 
described by multiple views in a scene. These methods have been oriented in two 
directions; a first one, whose goal is to reduce the number of candidates in object 
recognition by an indexing technique in 3D object hypothesis generation from single 
views; and a second one, whose goal is to identify the input object with respect to the 
model candidates by looking for the minimum measure distance between the object 
and the model candidates. The first direction allows the reduction of the number of 
potential model candidates to a few ones, which can be done very fast. The second 
direction allows to identify the best candidate. 



4.1 Indexing Views of 3D Objects 

In the first group of techniques, the idea is to represent a 3D object view by means of 
topological properties of the regions of the segmented image and then to create a table 
with each of the topological representations of the object. Then the identification 
process is based on indexing the input representation of one scene view to the table of 
the topological representations of the 3D object views. 

A topological representation is created using the oriented matroid theory by means of 
encoding incidence relations and relative position of the elements of the segmented 
image, and by giving local and global topological information about their spatial 
distribution. The result is a set of cocircuits [10] of sign combinations that relate 
segmented regions with respect to the convex hull of two selected regions of the 
scene. The details of this process are explained in [11, 12]. The set of cocircuits 
obtained is projective invariant, which is an important feature for the representation of 
the model objects. Fig. 6 shows the segmentation and process indexing of one object 
and Table 1 shows the resulting indexes of the object. 




Fig. 6. Segmentation and process indexing of two objects 
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The result of the process indexing looks as follows: 

Table 1. Index result of the process indexing of the images of Fig. 6. The first column is the 
baseline area from where the segmented regions are related. 0 means the region is inside the 
baseline area; - the region is one the left side; + the region is on the right side; and * means the 
region does not exist in the segmented image. 
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4.2 Learning and Recognising 3D Objects Represented by Multiple Views 

In the second group, the idea is to represent 3D object views by means of graphs and 
then to obtain the model as the synthesis from the graphs that represent the views of a 
3D object. Once the model has been learned, the recognition process is based on 
applying a distance measure among the input graph (the graph that encodes the 3D 
view of a scene object) and the object models. The input graph is assigned to the 
model graph with the minimum distance measure value. Fig. 7 shows the process of 
learning (synthesis of the object graph views) and recognition. 

Object views are often represented by graphs, and one of the most robust 
representations is based on attributed graphs. When a synthesis of these attributed 
graphs is required to learn a complete object through its views, then a good model 
representation are the Random Graphs. The generalization of these graphs is 
denominated General Random Graphs (GRG) which has theoretically, great 
representation power, but they need a lot of space to keep up with the associated data. 
We have defined several simplifications to the GRG to reduce the space and also to 
diminish the time matching complexity to compare among graphs. Wong and You 
[13] proposed the First-Order Random Graphs (FORGS) with strong simplifications 
of the GRG, specifically they introduce three assumptions about the probabilistic 
independence between vertices and arcs which restrict too much the applicability of 
these graphs to object recognition. Later, our group introduced a new class of graphs 
called Function-Described Graphs (FDG) [14][15] to overcome some of the problems 
of the FORG. The FDG also considers some independence assumptions, but some 
useful 2 order functions are included to constrain the generalisation of the structure. 



Robot Vision for Autonomous Object Learning and Tracking 



25 



Specifically an FDG includes the antagonism, occurrence and existence relations 
which apply to pairs of vertices and arcs. Finally, we have expanded this 
representation, [17][18] hy means of Second-Order Random Graphs (SORG), which 
keep more structural and semantic information than FORGs and FDGs. These last 
types of representation have led to the development of synthesis techniques for model 
object generation (by means of 3D object views) and graph matching techniques for 
graph identification. 




Fig. 7. Leai'ning and classification processes in the classifiers that use only one stmctural 
representation per model 



We show in this article, one example of unsupervised learning and recognition of 3D 
objects represented by multiple views. The set of objects was extracted from the 
database COIL-100 from Columbia University. We did the study with 100 isolated 
objects, where each one is represented by 72 views (one view each 5 degrees). The 
test set was composed by 36 views per object (taken at the angles 0, 10, 20 and so on), 
whereas the reference set was composed by the 36 remaining views (taken at the 
angles 5, 15, 25 and so on). 

The learning process was as follows: (1) perform colour segmentation in each 
individual object view image; (2) create an adjacency graph for each one of the 
segmented regions of each object view; (3) transform the adjacency graph in an 
attributed graph (AG) using the hue feature as the attribute for each node graph; (4) 
synthesize a group of 35 object views in a FORG, FDG and SORG using the 
algorithms described in [16] [19] (we use groupings of varying number of graphs to 
represent an object in order to evaluate the results, concretely we used 3, 4, 6 and 9 
random graphs for each 3D object). The recognition process follows a similar 
procedure, but instead of synthesizing the graphs a measure distance between them 
was applied to evaluate to which 3D object the input graph belonged. 

Fig. 8 shows 20 objects at angle 100 and their segmented images with the adjacency 
graphs. FORGs, FDGs and SORGs were synthesised automatically using the AGs in 
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the reference set that represent the same object. The method of incremental synthesis, 
in which the FDGs are updated while new AGs are sequentially presented, was 
applied. We made 6 different experiments in which the number of random graphs, 
FORGs, FDGs and SORGs, that represents each 3D-object varied. If the 3D-object 
was represented by only one random graph, the 36 AGs from the reference set that 
represent the 3D-object were used to synthesise the random graph. If it was 
represented by 2 random graphs, the 18 first and consecutive AGs from the reference 
set were used to synthesise one of the random graphs and the other 1 8 AGs were used 
to synthesise the other random graph. A similar method was used for the other 
experiments with 3, 4, 6 and 9 random graph per 3D-object. Note that if 4 random 
graphs are used, then each random graph represents 90 degrees of the 3D object. 

The best result appears when the SORG and FDG representations were used, although 
the best is the SORG representation. Fig. 9 shows the ratio of recognition success of 
the 100 objects using different object representation and distance measures. This 
figure also shows the result of describing individually each object view by means of 
an AG and then comparing each input AG against the rest of the prototype AG. 




Fig. 8. Some objects at angle 100 and the segmented images with the AGs 




Fig. 9. Ratio of recognition correctness of the 100 objects using SORG, FDG, FORG and AG- 
AG SORG:“*“; FDG;— FORG:~^ ; AG-AG:~^^ 
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5 Conclusions 

Robot vision methods require close attention to two important issues. First the real 
time issue: the methods must have adaptable mechanisms to overcome the variance in 
the sensing of the basic perception features and they must be robust. Another 
desirable feature in robot vision is that the objects, map, motion and control models 
must be learned on line. Not in only in one path, but in successive robot motions. In 
this article we have presented some of the methods, in tracking and object learning 
that we are developing following these ideas. We have also applied the same ideas for 
map building. 
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Abstract. A method for graduated scale inspection using computer vision is 
proposed. We deal mainly with the lens distortion problem in the image acquire 
device due to its influence in the uncertainty of the graduated scale inspection 
process. This paper presents an algorithm for image correction by means of 
camera calibration and distortion compensation. The camera calibration method 
provides the ideal undistorted coordinates of the system using as input distorted 
images of a 2D calibration pattern. The distortion compensation stage is 
implemented using the ideal undistorted coordinates as an unwarped mesh. 
Then distortion compensation can be applied to any image acquired with the 
system, improving the inspection procedure. Test results using real data are 
presented. Also, we describe the image feature extraction approach used in 
order to automate the process. 



1 Introduction 

There is an important effort to take advantage of computer vision systems in the 
dimensional metrology field [1]. Computer vision hardware and software have been 
used to improve calibration, measurement or inspection process. However, main 
problems arise when high accuracy is needed in order to accomplish with high-quality 
uncertainty levels. This is the case in the graduated scale inspection process, where 
the uncertainty source comes from lens distortion in the image intensifier device 
coupled to the image acquisition system. Consequently, lens distortion must be taken 
into account in measurements or inspections outcomes. 

The graduated scale inspection process consists of a comparison approach between 
the scale under inspection and a length standard, as Fig.l shows. 



Fig. 1. Scale graduated inspection. A typical scale is compared against a length standard. 
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In Fig. 1 the reference cursor on the length standard is placed mechanically against 
a measurement interval on the scale under inspection. The user does this through an 
observation device. For this example the measurement error is calculated as the 
difference in the length standard, 2,15, and the scale under inspection, 2,0, resulting in 
an error of 0,15 units of length. In our practical implementation of the layout in Fig. 1, 
we use an optical encoder as the length standard, a multimedia camera with 
microscope as the observation device and a mechanical artifact that supports the 
whole arrangement in concordance with international recommendations [4]. Then the 
accuracy of the inspection resides in the exactness of the positioning of the cursor 
over the printed graduation in the scale. Two factors have influence in the positioning: 
the mechanical device and the good judgment of the user that is controlling the 
inspection through the observation device. We pay attention in the second factor; 
supplying undistorted images to the user, thus improving the visual inspection. 

This paper then deals with image distortion correction for the particular case of 
graduated scale inspection using the layout described. In particular, the distortion 
generated by the combination of the camera and microscope lenses, provides 
deformations that greatly fails with the classical distortion models described in the 
literature [6]. Instead, we adopt a mesh unwarping approach in which the whole 
image is subdividing and a local correction is applied to each independent region. The 
camera calibration procedure [2] provides a reference unwarped mesh, which takes as 
input real distorted images of a 2D grid pattern and a least squares approach leads to a 
“straight lines in the scene should be straight in the image” sense. Furthermore, image 
feature extraction is used in order to achieve a reliability system and automation. 

The paper is organized as follows. Section 2 describes the camera calibration 
method by using real distorted images of a 2D calibration pattern. A warped mesh is 
obtained by means of image feature extraction and it is used by the least squares 
approach to derive the unwarped mesh. Section 3 presents the distortion 
compensation algorithm. The algorithm takes both warped and unwarped meshes to 
reproduce unwarped images. The performance of our method is demonstrated in 
section 4 by the results of applying distortion compensation to real images of 
graduated scales. Section 5 gives concluding remarks and future work. 



2 Camera Calibration Method 

In the camera calibration procedure, we use an approach similar to the pinhole camera 
model [5]. In the model presented in this paper there are two main differences. The 
first is that the scene is magnified and not inverted compared to the pinhole camera 
[2]. In the second, the short field-of-view of the acquisition system imposes planar 
scenes with no depth. Then a 2D calibration pattern is used and the depth remains as a 
free parameter. The calibration model is shown in Fig. 2. In this figure, the calibration 
pattern is placed in the target plane, T, and the image is formed in the image plane, I. 
The model is completed with the focal plane, F, which contains de focal point. Then, 
a feature in the calibration pattern, M, is projected to the image plane, m. Focal and 
image planes are separated / units and target and image plane are at d units of length. 
The image plane has two components: distorted and undistorted, both related by a 
distortion model. In order to perform image-processing tasks, image coordinates are 
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stored and managed in row-column computer arrays. The distortion produces warped 
and unwarped computer arrays. The parallel depth alignment (Z axis) between the 
target and image planes is achieved by focusing. 




Computer 

array 



Fig. 2. Camera model. A geometrical model relates global coordinate to array coordinates. 

The main idea of calibration is to obtain a set of unwarped features starting from a 
set of warped features. As we will show, features are line crossings in the image 
formed by the calibration target. To achieve automation, the procedure is performed 
in two stages: feature extraction and distortion estimation. 

2.1 Feature Extraction 

Feature extraction is performed with some modifications as described in [2]. The 
procedure starts with an original image of a 0,5X0,5 mm calibration grid (Fig. 3(a)). 
Then, a gray scale closing filter extracts the background. The subtraction between the 
original image and its background followed by thresholding yields to a simplified 
image. Horizontal and vertical line extraction is applied to the binary image by 
convolving with horizontal and vertical structural elements. The feature image in Fig. 
3(b) is obtained by computing the center of gravity of the horizontal and vertical 
intersection. Each center of gravity conforms the starting point in the search of a 
finest subpixel feature extraction. 




(b) 



Fig. 3. Image pre-processing using morphological. 
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The crossing points of the vertical and horizontal lines are extracted with suhpixel 
accuracy through a method based on the starting point and the analysis of a squared 
pattern around it, similar to the one proposed in [3]. This method avoids the iterative 
minimization procedure described in [2] and constitutes a closed form solution in 
order to gather computation speed without a loose of exactness. The main idea 
surrounding the algorithm is to fit lines to image data and then found the intersection 
point. Two lines are needed for subpixel feature extraction: horizontal and vertical. 
The lines are defined by analyzing the original grayscale image along a squared 
pattern centered at starting points. The Fig. 4 shows a detail of the sampled squared 
pattern around the starting point. The gray level profile of the pattern defines eight 
border points marked as circles signs in Fig. 4(a). The border points are arranged in 
horizontal and vertical pairs to find their midpoints. Then the four midpoints, marked 
as triangles signs in Fig. 4(a), define the two lines. Simple mathematics is used to 
locate the intersection point with subpixel accuracy marked with a star sign. The Fig. 
4(b) shows a typical gray level profile of the squared pattern. The first order 
derivative of the profile is computed. Then, the local maxima of derivative is detected 
and grouped into pairs to conform the border points. Since the original image is on a 
discrete domain, the gray level intensity is computed using bilinear interpolation. 




(a) 



(b) 



Fig. 4. Squared pattern of the original grayscale image and its samples. 



Then the whole image is processed using the method described in order to extract 
the N entire feature points. The Fig. 5(a) shows the extracted points overlying the 
original image. Additional processing is required to correlate warped feature points, 
(r, c), against their global position defined in the grid pattern, {X^., T^.), where 1=1, 
N points. First, horizontal and vertical images are labeling with a consecutive 
number starting from one. Then, 2D global positions are assigned to feature points by 
searching its position in both labels image. The X^. and Y^. coordinates are calculated 
through multiplying the size of the grid and the label number. Fig. 5(b) shows the 2D 
world coordinate assignment. 
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Fig. 5. Feature points with subpixel accuracy and their correlation with 2D global positions. 



2.2 Distortion Estimation 



Distortion estimation is performed by first generating an unwarped set of points from 
the extracted warped features, their correlation with 2D global positions and the 
camera calibration model. In the camera calibration model, the main idea is to obtain 
a set of parameters that relates world positions, {X^., Y^) to unwarped computer array, 
(C,> <^u)- Following the procedure described in [2] this relation is 



«« - 
a„ a„ 



1 



where 



= SCO&6 = s&inO = st^ +Cq = st^ +Kq s = 






( 1 ) 



/■Sv 



Z„+f-d Z„+f-d 



and t^, t are translation in X and Y, 6 is the rotation in the Z direction, (r„, c„) is the 
pixel position of the origin C„, and is the aspect factor of the ccd camera. 

Then the distortion estimation is derived as the relationship between unwarped, {r^., 
cj, and warped, {r., c) computer arrays. The equation (1) should fit a set of N 
observations (c., r), T,,) i=l, N to the model in the equation with adjustable 

parameters p = (a^, a^, t^, tj. Therefore, p should minimize the following error 



” i=i 

Substituting equation (1) into (2) and taken into account that at the minimum the 
partial derivatives of y with respect to p vanish, we obtain the following 
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The equation (3) can be resolved for p taken as input data 2D global positions 
YJ) and their correlated projections into the image plane {c., r), since 
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Then, equation (1) can be used to generate an unwarped grid using the warped 
features and their correlation with 2D global positions (Fig. 6). 




Fig. 6. Unwarped grid overlying the original warped image. 



3 Distortion Compensation Algorithm 

We test with the following well-known model of distortion 
>'u=r + S^(r,c) c„ = c4-<5^(r,c) 

where and c„ are the ideal unwarped coordinates, r and c are the corresponding 
warped coordinates and S^r, c) and Sj^r, c) are bi-variable polynomials of n degree. 
The bi-variable polynomial method fails in representing the real system distortion. 
This is due to the fact that local distortions are not in agreement with the proposed 
model. In general, cheap lenses, like the one used in this vision system, presents a 
substantial amount of local distortion. Therefore, to increase the accuracy in the 
distortion compensation stage, we introduce a region-based approach similar to the 
mesh-warping algorithm used in image morphing [7]. This algorithm requires both 
source and destination meshes to perform bilinear transformations between regions 
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inside the mesh. As a first steep and referring to the Fig. 7(a), a uniform auxiliary 
mesh (circle sign) is placed on top the non-uniform mesh (dot sign) obtained in the 
feature extraction stage. In this sense, there are defined quadrilateral regions and the 
features that do not define quadrilaterals are discarded. Quadrilateral regions are 
marked with the triangle sign, which is enclosed by only four feature points marked 
with both circle and dot signs. Then, warped quadrilateral sub-regions are transformed 
to the corresponding quadrilateral unwarped sub-region as shown in Fig. 7(b), 
according with the following backward transformation formula 

r = flo+fli?-„ +a2C„ c = ^o+^i''»+V„+^3''«c„ (4) 



where parameters a and b are defined by the knowledge of the eight vertex points 
as follows 
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Fig. 7. Region-based approach for distortion compensation. 



Finally, applying the backward transformation in (4) to the whole original image, 
we obtain the unwarped image shown in Fig 8(a). Once the camera is calibrated using 
the procedure described here, the visual inspection of a graduated scale can be 
improved with the backward transformation as shown in Fig 8(b). 



4 Results 

In order to obtain quantitative results, we perform two experiments to proof the 
“straight lines in the scene should be straight in the image” sense. In the first 
experiment, we fit the unwarped feature points to straight lines and the squared sum 
of the residuals is compared against the one obtained with warped features. The 
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Fig. 8. Distortion compensation using the region-based approach described. 



squared sum of residuals for the warped grid is 19,4188 (pixels) against 1,4086X10 ‘^ 
for the unwarped one. In the second experiment we test the uniformity in distance 
between vertical and horizontal lines in the warped and unwarped images of the 
calibration grid. We obtained a uniform distance between lines of 23,2 pixels for the 
unwarped case against the non-uniform distance of 21,6 to 25 pixels for the warped 
case. Then, in our mind, the two experiments are indicatives of the main idea shaped 
to project the scene without distortion. Further results of automated feature extraction, 
camera calibration and distortion compensation procedures were presented all along 
the paper. 



5 Conclusions and Future Work 

A computer vision system for graduated scale inspection was presented. The main 
problem faced up was the optical distortion in the acquisition hardware. The 
generation of a reference unwarped set of features was done using camera calibration 
tools. Then the algorithm generates nearly ideal images by the distortion 
compensation method. Also, feature extraction and image processing achieved the 
automation of the process. The results indicate us the reliability of the approach in the 
“straight lines in the scene should be straight in the image” sense. For metrological 
purposes, the accuracy of the system was tested qualitatively and quantitatively by the 
development of two experiments. 

In the future, additionally image processing can be done in order to complete the 
automation of the measurement process and avoid the human intervention. In our 
idea, the layout presented in the Fig. 1 will suffer of few changes. First, the length 
standard will be precisely placed in a known position. Then, the vision system will 
assign a measurement to the observation process. Then, the main problem is 
concerned with the human knowledge that is used in a measurement with graduated 
scales. 
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Abstract. Computer vision systems are playing an important role in 3D 
measurement for industrial applications. Real time image processing algorithms 
are useful in order to achieve reliability in feature extraction from the global 
environment starting from planar images. For example, in a structured light 
vision system is essential to extract the pattern that a laser source is shaping 
with the objects under inspection. In this sense, this work describes a single 
strip image extraction algorithm that could be used as an analysis tool in those 
structured light systems. The experimental setup is implemented using the 
following equipment: A PC equipped with a frame grabber, a high-resolution 
CCD camera, a laser stripe projector and specific software developed using 
Visual C-l-l-6. The system accomplishes with real time operation and high 
subpixel accuracy. 



1 Introduction 

Recently, in our dimensional metrology laboratory, have been developed 
measurement techniques using computer vision with real time operation for industrial 
applications. Computer vision efforts are directed to reduce costs and operative time 
in the industrial field [1]. Additionally, measurement vision systems offer the great 
advantage of measuring without mechanical contact. However, vision systems still 
lack of high accuracy due to some restrictions in the image formation process. On the 
other hand, our aim to develop a measuring vision system is based on the utilization 
of auxiliary techniques that contributes with the high accuracy purpose. For example, 
the mechanical arrangement in the computer vision layout can be tested with higher 
accuracy standards, like coordinates measuring and form scan machines. Moreover, 
the accuracy in a measurement vision system is not only limited by the resolution of 
the acquisition devices; the planar image, highlighted by a structured light source, is 
improved and reduces some limitations in the image formation process. Also, the 
structured light contributes to simplification of the image management for feature 
extraction [2]. 

It is well known that computer vision techniques take advantage of subpixel feature 
extraction [6]. In particular, a structured light system needs to extract the strip image 
that is formed by the intersection of a laser stripe projector with the object under 
inspection. Then, relevant extracted features can be reconstructed in a 3D 
environment using the triangulation principle together with the camera and laser 
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source calibration. Therefore, it is possible to perform 3D measurements using 
monocular vision [5]. 

This paper deals with the sub pixel laser stripe profile extraction in planar images 
incoming from a structured light system for metrological purposes. The aim is to 
create a real time operation system to be operated as a high accuracy measurement 
instrument. The relevance of our work is the development of a practical 
implementation, this is to apply a feature extraction algorithm to create a useful tool 
with industrial application. The organization of this paper is as follows: Section 2 
presents the computer vision system principle for dimensional measurements. A 
review of an algorithm for laser stripe profile extraction is presented in section 3. We 
give emphasis to in a closed form solution that meets accuracy and committed 
processing time. Section 4 describes hardware and software platform supporting our 
implementation. Section 5 presents some results of applying three-dimensional 
reconstruction to the feature extraction process. Finally, our work is summarized in 
section 6. 



2 Measurement Principle Using Structured Light 

Our scheme for the dimensional measurement using computer vision is based on the 
well-known triangulation principle and the layout in Fig. 1(a). The hardware was 
selected in order to gain high accuracy at low cost. The laser stripe projector generates 
a sharp light pattern that intersects with the 3D object under inspection. Then the 
pattern is digitized with a CCD camera and a frame grabber attached to a computer. 
The resulting planar image contains simplified information about the object, useful 
for 3D reconstruction purposes. It is possible to demonstrate that a 3D reconstruction 
process of the 2D stripe pattern can be done by means of camera and laser plane 
calibration, as Fig. 1(b) shows. 




Planar 

ima^ 




Camera 

calibration 




calibration 



(a) 



(b) 



Fig. 1. Dimensional measurement using computer vision. 



The layout of Fig. 1 involves the following main tasks: 

1 . Camera calibration. Implies to find the mathematical model that relates 3D global 
coordinates with its planar projection [7, 8]. 
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2. Laser plane calibration. It consists in finding the mathematical expression of the 
laser plane. 

3. Stripe image processing. The main idea is to extract the spine of the stripe image 
with subpixel accuracy. 

4. Reconstruction. This is the aim of the measurement instrument and takes as inputs 
the calibration process and the stripe image to generate 3D positions. 

In this paper we focus mainly in the stripe image processing stage, due to its 
importance in the measurement process and its relevance in the automation of the 
procedure. 



3 Stripe Image Processing 

The aim of the algorithm described in this section is to extract the spine of the stripe 
image accomplishing with subpixel accuracy. The physical situation presented in Fig. 
1 imposes planar stripe images, which pattern approximates a Gaussian distribution. 
Then, the feature extraction process provides a reduced set of descriptor parameters 
and also contributes with the improvement in the reconstruction. The algorithm is 
divided in two main stages: Image simplification in order to isolate the stripe profile 
and subpixel feature extraction in order to obtain accurately the set of parameters 
conforming the spine. 



3.1 Image Simplification 

The Fig. 2 shows typical planar images generated by the proposed layout in Fig 1. 
The Fig. 2(a) shows the scene with no light from the laser stripe projector when it is 
turned off. In the Fig. 2(b), the laser plane intersects the scene when the projector is 
turned on. The Fig. 3(c) shows the absolute value of the difference between figures 
2(a) and 2(b), isolating in this way the profile of the object under inspection. The laser 
projector is turned on and off by simple electronics and control software. 



3.2 Subpixel Feature Extraction 

In this stage [4], the image in Fig. 2(c) /(«, v) is processed by establishing initial 
conditions for a center of the stripe p,(p„, p ) together with a line width w^ and a line 
direction d^). The initial center of the stripe is obtained using a scan procedure 
from top to bottom and left to right in image coordinates. The extracted information 
conforms a line element e;(P;, w,, d,), as shown in fig 3(a). Then, starting from the 
initial conditions, the subsequent line elements are generated in the following way. 
The next line element e^(p,, w., d^) is estimated by forming a circular pattern r(p, r) 
with center in p. and radius r=0.6w,. The pattern is used as reference trajectory in 
finding the borders of the stripe bj b^, as shown in Fig. 3(b). The position, direction 
and width of the next element line are given by 
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Fig. 3. Line elements of a gray scale image. 



•’l+*’2 j llu u II 

P/+1 = ^ d;+i=P,+i-Pi >^','+1 =|bi-b2| 

The border points bi and b2 are detected by analyzing the gray level intensity of the 
circular pattern t. The circular pattern is a set of L 2 jtrJ samples taken over the 
perimeter of the circumference with center p, and radius r, in other words, t(p, r)=(co, 
Cl, cL2,trj), where 
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c^I(px+ \ p I cos(iSff), py+ I p I sen{iS0)) 

for /=!, 2, L27irJ-l and J^27l/L27irj. Due to discrete nature of the gray 
scale image I, the samples a are computed using bilinear interpolation. Then, the first 
derivative of f(p, r), dt{p, r), is computed using finite differences. The local maximum 
of the magnitude | dt(p, r) \ is detected and grouped in to pairs by means of the sign 
in dt(p, r). Given a pair of border points, it is reported a line element only if the 
magnitude of | dt(p, r) \ in the border points is larger than a threshold. On the 
opposite way, it begins a new scan procedure. In our practical implementation, we use 
only one half of the circular pattern, as Fig. 4(a) shows in a zoomed detail. The gray 
level of the pattern is shown in Fig. 4(b). The Fig. 5(a) shows the whole spine 
extraction of a stripe image and the Fig. 5(b) presents a detail. 




(a) 



(b) 



Fig. 4. Circular pattern 




(a) 



(b) 



Fig. 5. Spine extraction with subpixel accuracy. 



4 Hardware and Software Platform 

The layout of the computer vision system for dimensional measurement using 
computer vision is presented in Fig. 6(a). In this figure, a PC equipped with a DT3I55 
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frame grabber [3] acquires 640X480X8 planar images incoming from a high 
resolution CCD camera. In order to achieve real time process, the algorithm described 
in the previous section was written in Visual C++ 6.0. The user interface is shown in 
Fig. 6(h). 





(a) (h) 

Fig. 6. Hardware and software platform for dimensional measurement using computer vision. 



5 Experimental Results 



Some experimental results can be reported with the 3D reconstruction of the spine 
extracted from the stripe image. The experimental arrangement, Fig. 7, is supported 
hy a coordinate measuring machine with the purpose of system calibration. After laser 
plane and camera calibration, the reconstruction is implemented with a similar 
approach as described in [5]. In our test setup, the object under inspection is a scaled 
model of a tooth. It is easy to prove that the reconstruction problem can be stated as 
follows. 
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where {r„ 7), Ty, T^,f, dp^, dpy, k, c„ Cy} are camera calibration parameters, {A, B,C, 
D} are laser plane calibration parameters, {Xf, Yf) are coordinates in the digitized 
image, {X^, Y^) are image distorted coordinates, (X„, T„) are image undistorted 
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coordinates and (x„,, y„, Zv) are the 3D reconstructed coordinates of the 2D feature (Xf, 
Yf). Then, using simple matrix algebra, the 3D coordinates can be computed from 
equation (1). Applying equation (1) to the extracted spine, the 3D plot of Fig. 8 is 
obtained. Comparing the measurement outcomes using the computer vision system 
described here against the one performed with coordinate measuring machine, we 
compute an RMS reconstruction error of 0.08mm in a lOOXlOOXlOOmm 
measurement volume. 




Fig. 7. Experimental setup. 




Fig. 8. Plot of a 3D reconstruction. 



6 Conclusions and Future Work 

A vision system for subpixel laser profile extraction was presented. The system 
accomplishes with high accuracy in both, image processing and 3D reconstruction. 
An algorithm for feature extraction with real time operation was tested in an 
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experimental setup. The geometric oriented approach of the image processing 
algorithm contributes to an easy implementation using Visual C++. For this reason we 
discard the use of third party software for image or mathematical processing; our 
developed software was implemented using basic development tools. The results 
oriented our future work in the following directions. 

1. The development of an electro-mechanical device in order to scan the whole 
geometry of 3D work pieces. In our concept, a rotary support with angular 
position feedback is needed to perform a complete measurement process. 

2. The incorporation of 3D graphics in the user interface. We are planning the 
development of real time render in the user interface using, for example, OpenGL. 

3. The calibration improvement. We are planning to build a rotary support and 
calibrate it with the use of higher accuracy standards, like coordinate measuring 
and form scan machines. 
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Abstract. A framework for recovering high-resolution video sequences from 
sub-sampled and compressed observations is presented. Compression schemes 
that describe a video sequence through a combination of motion vectors and 
transform coefficients, e.g. the MPEG and ITU family of standards, are the 
focus of this paper. A multichannel Bayesian approach is used to incorporate 
both the motion vectors and transform coefficients in it. Results show a 
discernable improvement in resolution in the whole sequence, as compared to 
standard interpolation methods. 



1 Introduction 

High-frequency information is often discarded during the acquisition and processing 
of an image. This data reduction begins at the image sensor, where the original scene 
is spatially sampled during acquisition, and continues through subsequent sampling, 
filtering or quantization procedures. Recovering the high-frequency information is 
possible though, as multiple low-resolution observations may provide additional 
information about the high-frequency data. This information is introduced through 
sub-pixel displacements in the sampling grid, which allows for the recovery of 
resolution. 

Although work has been devoted to the problem of reconstruction of one high 
resolution image from a sequence of low resolution ones (see for instance [1-5] and 
[6] for a review), not much work has been reported on the problem of increasing the 
resolution of a whole image sequence simultaneously (see however [7-9]). 

In this paper we present a new method to obtain a whole high resolution sequence 
from a set of low resolution observations. The method will use the relationship 
between the high resolution images in the sequences and also the process to obtain the 
low resolution compressed ones from their corresponding high resolution images. 

The rest of this paper is organized as follows: In section 2, we formulate the 
problem within the Bayesian framework, define the acquisition system to be 
considered and the prior information we are going to use on the high resolution image 
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sequence. In section 3, we introduce an iterative algorithm for estimating the high- 
resolution sequence. In section 4, we present results from the proposed procedure. 
Conclusions are presented in section 5. 



2 System Model 



When images from a single camera are captured at closely spaced time instances, then 
it is reasonable to assume that the content of the frames is similar. That is, we can say 
that 

fi {x, y)= fk {x + d",i,{x,y),y + dli^ {x, y ))-i- n,j^{x,y) , ( 1 ) 

where and//x,y) are the gray level values at spatial location (x,y) in the high- 

resolution images at times I and k, respectively, d^^{x,y) and dj^{x,y) comprise the 

displacement that relates the pixel at time k to the pixel at time /, and n^Jx.y) is an 
additive noise process that accounts for any image locations that are poorly described 
by the displacement model. 

The expression in (1) can be rewritten in a matrix- vector form as 

f,=c(d„X+n,, , (2) 



where and f^, are formed by lexicographically ordering each image into an one- 
dimensional vector, C(diJ is the two-dimensional matrix that describes the 
displacement across the entire frame, is the column vector defined by 
lexicographically ordering the values (d'j.(x,y),(f/j(x,y)) and n,^ is the noise process. 

When the images are PMxPN arrays, then f,, f^, d,^ and n,^ are column vectors with 
length PMPN and C(d,J has dimension PMPNxPMPN. 

The conversion of a high-resolution frame to its low-resolution and compressed 
observation is expressed as 



y k ~ TdctQ 



Tdct\ AHf,-^c(v, Jy,. 






( 3 ) 






where is a vector that contains the compressed low-resolution images with 
dimension MNxl, is the high-resolution data, is the motion vector transmitted by 
the encoder that signals the prediction of frame k from previously compressed frame i, 
C(\^ J represents the prediction process with a matrix (for images said to be “intra- 
coded”, the prediction from all frames is zero), A is an MNxPMPN matrix that sub- 
samples the high-resolution image, H is an PMPNxPMPN matrix that filters the high- 
resolution image, and are the forward and inverse DCT calculations, and Q 
represents the quantization procedure. 

Let F be the vector J that contains all the high-resolution frames 

and let Y be the vector that contains all the low-resolution frames 
(yf,y^,...,y[,... propose to follow a maximum a posteriori (MAP) 
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estimation approach in recovering the high resolution information from the low 
resolution compressed observations. Towards this task, we will use the following 
approximation for the conditional distribution of the observed low resolution images 
given the high resolution sequence 



This conditional distribution enforces similarity between the compressed low 
resolution image and its high resolution image (through a process of blurring and 
downsampling, represented by H and A respectively). With A 4 we control this 
resemblance. 

In this paper we assume that the high resolution motion vectors have been 
previously estimated (see [ 6 ] for different approaches to perform this task). 

In the literature about motion estimation there are methods based on optical flow 
(see [10] and [11]), block matching [12], and feature matching. Simoncelli in [13] 
uses the optical flow equation but also adds an uncertainty model to solve the 
extended aperture problem and a Gaussian pyramid to deal with big displacements. 
Another interesting method was proposed by Irani and Peleg (see [14]) using an 
object based approach. The motion parameters and the location of the objects (it is 
supposed that there are several moving objects in the image sequence) are computed 
sequentially taking into account only one object at a time by using segmentation. A 
Gaussian pyramid from coarse to finer resolution is also used to avoid problems with 
the displacements. 

In our implementation the motion field has been computed, for all the compressed 
low resolution frames, mapping the previous frame into the current one, and then 
interpolating the resulting low resolution motion field to obtain the high resolution 
motion field. Better motion field estimation procedures, which probably would 
provide better reconstruction results, are currently under study. 

From equation (2) , assuming smoothness within the high resolution images and 
trying to remove the blocking artifacts in the low resolution uncompressed images, we 
use the following prior model to describe the relationship between the high resolution 
images: 



In the first term of the above prior distribution we are including the quality in the 
prediction (if the prediction of our frame f from the previous one is a good prediction, 
this term will be small). The second and third terms represent smoothness constraints, 
where Qi represents a linear high-pass operation, Qz represents a linear high-pass 
operation across block boundaries, and Iz and control the influence of the two 
norms. By increasing the value of Iz, the density describes a smoother image frame, 
while increasing the value of 2,3 results in a frame with smooth block boundaries. 

After having defined the prior and degradation models, several points are worth 
mentioning. First, note that the degradation process (see equations (4) and (5)) relate 
each low resolution observation to its corresponding high resolution one and so no 



MY|F) = Hyp-.yJf„-.fJ=riF(y,- |f,) ’ 



(4) 



where 




(5) 
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prediction of high resolution images is included in it. Due to this fact, this model is 
different from the one currently used in most high resolution methods (see Segall et 
al. [6] for a review). Note also that the prior model is responsible for relating the high 
resolution images and so a change in a high resolution image will enforce (through 
the prior model) changes in the other high resolution images. Finally, note that, 
although it is also possible to include prior models over the high-resolution motion 
vectors, in this work we assume that they have been estimated previously to the 
reconstruction process, see however [15-16] for the simultaneous estimation of high- 
resolution motion and images. Work on prior motion models which are consistent 
over time will be reported elsewhere. 



3 Problem Formulation and Proposed Algorithm 

The maximum a posteriori (MAP) estimate provides the necessary framework for 
recovering high-resolution information from a sequence of compressed observations. 
Following the Bayesian paradigm the MAP high resolution sequence reconstruction 
satisfies 



F = arg maxp {p(F )p(Y | F)} ■ (7) 

Applying logarithms to equation (7) we find that the high resolution image sequence 
estimate F satisfies 



F = arg min A X ||f,_, - c(d,_,, >,■ ||' + A, £ ||Q,f, f + A3 £ ||Q, AHf, f + 

L i=2 j=l i=l 

+ A,t||y,-AHf,f| 



( 8 ) 



In order to find the MAP we propose the following iterative procedure. Let F° be an 
initial estimate of the high resolution sequence. Then given the sequence 

obtain, for 1=1, ...,L the high resolution image at 

step n+1 hy using the following equation 

r' =f; -c(d,_y )f;J+A[f; -c(d,^ 

^A^QfQif," +A3H"A’'Q^Q,AHf; - AHf/')} 

The relaxation parameter afj determines convergence as well as the rate of 
convergence of the iteration. It is important to note that for the first and last frames in 
the sequence, fi and f/, respectively, the frames fo and fi+l do not exist and so the 
above equation has to be adapted by removing the presence of fo and f/,+i respectively. 



4 Experimental Results 

The performance of the algorithm is illustrated by processing frames from the Mobile 
sequence. Each original image is 704x576 pixels and it is decimated by a factor of 
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two in each dimension, cropped to a size of 176x144 pixels and compressed with an 
MPEG-4 encoder operating at 1024Kbps. Three frames from the compressed bit- 
stream are then sequentially provided to the proposed algorithm, Qi is a 3x3 discrete 
Laplacian, Q 2 is a difference operation across the horizontal and vertical block 
boundaries, and the model parameters were experimentally chosen to be li=100, 
l2=0.01, As=0.002, X 4 =l, and afi=0.125. The algorithm is terminated when 




/iFjr <M0“’ • 



(10) 



The performance of the algorithm is defined in terms of the improvement in signal-to- 
noise ratio, defined by 

/SAR = 101ogio|^||F-Y|p/||F-F||" V (11) 

where Y is the zero-order hold of image Y. A representative algorithm result is 
presented in the Figs. 1-4. The original image is shown in Fig. 1, the compressed 
observation after bi-linear interpolation in Fig. 2, and the image provided by the 
proposed algorithm is depicted in Fig. 3. Fig. 4 zooms a part of the image obtained by 
bi-linear interpolation (Fig. 4a) and our proposed method (Fig. 4b). The sign “Maree” 
unreadable in the image shown in Fig. 4a whereas is almost readable in Fig. 4b. The 
smoothness constraint also performs well, as can be observed in the left area of the 
images. The corresponding ISNR values for Fig. 2 and 3 are 30.4123dB and 
31.1606dB, respectively. These figures, as well as the visual inspection, demonstrate 
the improvement obtained by the proposed algorithm. Figure 5 plots the value of the 
stopping criterium of the algorithm (see equation (10)) as a function of the number of 
iteration k, demonstrating the convergence of the algorithm. 




Fig. 1. Cropped part of the original image from the sequence before decimation and 
compression. The proposed method’s aim is to estimate all these images at the same time. 
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Fig. 2. Decoded observations after bi-linear interpolation. The compression artifacts are easily 
noticeable. 




Fig. 3. Image obtained by the proposed method. The comparison should be established between 
this figure and Fig. 2. 




a) b) 

Fig. 4. a) Decoded image after hi-linear interpolation, b) The improvement achieved hy the 
method. The sign “Maree” unreadable in the image a) whereas is almost readable in b). The 
smoothness constraint works well, as can he observed in the left area of the images. 
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Fig. 5. Convergence plot of the iterative procedure. The implemented method guarantees 
convergence. 



5 Conclusions 

In this paper we have proposed a new iterative procedure to estimate a high resolution 
video sequence from low resolution observations. The method uses fidelity to the low 
resolution data and smoothness constraints whithin and between the high resolution 
images to estimate the sequence. Incorporating temporal coherence of the high 
resolution motion vectors as well as the development of a parallel implementation of 
the algorithm are currently under study. The proposed method has been 
experimentally validated. 
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Abstract. This paper proposes an approach to estimate 3D rigid facial motions 
through a stereo image sequence. The approach uses a disparity space as the 
main space in order to represent all the 3D information. A robust algorithm 
based on the RANSAC approach is used to estimate the rigid motions through 
the image sequence. The disparity map is shown to be a robust feature against 
local motions of the surface and is therefore a very good alternative to the 
traditional use of the set of interest points. 



1 Introduction 

To date, many efforts have been made to study the problem of camera motion from 
features extracted from monocular or stereo images [6,9,13]. The main approach 
estimates the motion by establishing correspondences between interest points on each 
image. There are two main shortcomings of such an approach; firstly, it requires the 
set of interest points on each image to lie on static 3D surfaces of the scene; and 
secondly, the surfaces of the scene must be textured enough to allow interest points to 
be estimated. When we approach the problem of estimating 3D rigid facial motions 
from images, we find that the problem of estimating the rigid motion of a 3D surface 
with many instantaneous local deformations is usually due to local facial motions 
[1,5,8]. Furthermore, it is well known that the surface of the face is not textured 
enough. Therefore, alternatives to the traditional use of the set of interest points must 
be considered. In this paper, a homography between disparity spaces is used to 
estimate 3D rigid motions. Dense disparity maps are used as a feature from which the 
homography parameters can be estimated. 

Since we are interested in studying 3D object motions near the camera, we use the 
general perspective camera model in order to analyze our images. An important 
instance of this situation appears in 3D videoconferencing systems, where the 3D 
shape of the head and face of each participant must be refreshed in each instant of 
time, and the usual short distance between cameras and surfaces introduces strong 
perspective effects [12]. 
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In Section 2, we introduce the geometrical concepts of the disparity space. In Section 
3, we study the rigid motion estimation in the disparity space. In Section 4, disparity 
map estimation is discussed. In Section 5, experiments carried out on image data are 
shown. Finally, in Section 6, discussions and conclusions are presented. 



2 Stereo Images 

Let us consider a calibrated rectified stereo rig, i.e. the epipolar lines are parallel to 
the x-axis. There is no loss of generality since it is possible to rectify the images of a 
stereo rig once the epipolar geometry is known [6]. We also assume that both cameras 
of the rectified stereo rig have internal parameters which are similar and known. 

Stereo reconstruction has been studied for years, and is now a standard topic in 
computer vision. Let us consider a rectified image pair, and let (x,y) and be two 

corresponding points in that image pair. Since the corresponding points must lie on 
the epipolar line, the relation between the two points is 

x’=x-d (1) 

y’=y 



where d is defined as the disparity of the point (x,y). From rectified stereo images, we 
can define representation spaces based on the projected coordinates that are 
equivalent to a 3D reconstruction of the points up to a homography of the 3D space 
[4]. These spaces are known as disparity spaces. The equations relating the 3D 
coordinates (X,Y,Z) with the disparity coordinates in the case of oriented and rectified 
cameras are [13]: 
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where Xg, yo, x’g are the principal point coordinates of the left and right image, 
respectively, a and a' are the focal distance of the left and right cameras, respectively 
and B is the baseline of the stereo rig. All image coordinates are expressed in terms of 
pixels. 

In this paper, we use the disparity space defined by the triple (x,y,d). From expression 
(2), taking a=a', the homographic relationship between the 3D coordinates of a point 
X=(X,Y,Z)’^ and its associated disparity vector (x, y,dY can be expressed as 
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or in a shorter way as 
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V 





T=(x, y,dj 



( 4 ) 



From equation (3), it is clear that in the case of non-calibrated cameras each pair of 
rectified stereo images provides us with the reconstruction of the surface being 
imaged up to projectivity. From the intrinsic parameters of the stereo rig, the 
projective reconstruction can be upgraded to metric. 



3 Rigid Motions in the Disparity Space 



Let us apply a rigid motion on the 3D data. If X and X' represent the 3D coordinates 
of a point before and after the motion, then 
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0 ^ 

V 

From expressions (4) and (5) we obtain 
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(5) 



( 6 ) 



Equation (6) describes the 3D homography F relating the disparity homogeneous 
coordinates of a point before and after the motion. 



3.1 Noise on the Data 

An important feature of the disparity space is that the noise associated to the data 
vectors (x, y,dY under some assumptions can be considered isotropic and 
homogeneous. The x, y disparity coordinates are affected by the noise produced by 
the discretization effect and without additional information can be assumed equal for 
all pixels. The noise on d is associated to the change in the gray level of the pixels in 
the stereo matching process and could be estimated from this process. We can 
therefore assume, that the noises associated to x, y and d are independent. If we 
assume that the variance of d is of the same magnitude as the variance of the 
discretization error, the covariance matrix of the noise on each point of our disparity 
space is Q=o^ In our case, apart from the above measurement errors, we also 
assume that in our scene there are points in motion. All the correspondences 
associated with these moving points are therefore potentially erroneous. In order to 
select point correspondences which are unaffected by the moving points, we use the 
RANSAC algorithm to select the subset of point correspondences that are free of this 
contamination. 
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3.2 Rigid Motion Estimation 

Let (Ti, T’i) be a set of point correspondences. The problem of estimating the rigid 
motion parameters (R,T) from the set of points (tj, r'i) amounts to minimizing the 
error 



vector for from (6), and Q is the covariance matrix of the disparity vectors. Here 

we assume an i.i.d noise model. Equation (6) shows that this error function is not 
linear in the parameters for (R,T), so a non-linear method has been used to estimate 
the vector of six unknowns by parameterizing the rigid motion. Here we are 
interested in the case of small rotations (< 5 degree), so the rotation matrix can be 
expressed as R=I-t[®]x, where I is the identity matrix and [®]x represents the skew- 
symmetric matrix associated to the vector a. In order to estimate the solution vector 
{(o,Ty a quasi-linear iterative algorithm has been used on the normalized image 
coordinated [3]. An initial solution for the vector can be calculated from 

equation (6), solving the linear system that appears by considering the equations 
associated to Euclidean coordinates of all the points T and x' and assuming all X=l. In 
the next iteration we recalculate the value of X from the above solution and again 
solve equation (6) for a new solution. We iterate until convergence of the vector 
(®,T)’^. In our experience, three or four iterations are enough. 

Nevertheless, the presence of outliers in the correspondences between the disparity 
maps degrades the estimation considerably. In order to circumvent this problem a 
RANSAC based algorithm is proposed in Table 1. This algorithm makes a robust 
iterative linear estimation as a first approach, but because of the noise in the disparity 
estimation, a non-linear optimization step from the pixel color values is necessary. 



4 Disparity Map Estimation 

In this paper two different dense disparity maps are used. Firstly, we estimate the 
disparity map for each stereo image, and from this we estimate a region of interest by 
applying a binary thresholding operator on it. Secondly, we estimate the dense motion 
vector map associated to every two consecutive left and right images, respectively. In 
this case we assume that the region of interest is the region of moving pixels nearest 
the camera. 





is the estimated Euclidean coordinate 
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Table 1. 



Iterative robust algorithmi. To estimate and normalize the set of 

disparity vectors 

II. Repeat N iterations 

To choose n>=2 disparity vectors randomly 

i. For each vector calculate A,i, Ai and biii. Solve AAX=b for 
X 

iii. Count the number of inliers.III. To take the solution 
with higher number of Inllers as the best linear solution. 

IV. To minimize the pixel color differences between images by 
applying the Levenberg-Mardquart algorithm from the linear 
solution. 



Figure 1 shows how we estimate our region of interest on each stereo image. In short, 
we segment the subset of moving points of the scene to a distance of the camera, 
which is less than a fixed threshold. In our case, the planar motion is calculated in 
pixel units. In order to remove isolated small regions we apply a size filter. All the 
pictures shown correspond to the left image of the stereo pair. 




Fig. 1. This example corresponds to rotation left-right of the head. Picture (a) represents the 
estimated stereo disparity map, picture (h) represents the x-motion dense map, picture (c) 
represents the y-motion dense map, and picture (d) represents the result of the union of picture 
(h) and picture (c) intersection with picture (a). 



Dense disparity maps from two images is a very active field of research [10]. Very 
recently, new energy minimization algorithms based on cut graphs was proposed 
[2] [7]. These algorithms achieved a very good compromise between temporal 
efficiency and accuracy of the estimation [7]. Since the implementation of these 
algorithms only depends on a free parameter, 7,>0, associated to the scale of the 
estimation [10], very different estimations can be achieved by varying the X value. 
Low values of X provide us with more accurate estimations but a larger number of 
points will be undefined. A scale combination scheme therefore provides us with a 
better estimation. In our case, four different scales (X=3,5,10,30) have been 
considered in order to estimate the disparity maps. The combination scheme defined 
the disparity value on each pixel as the value of the lowest scale in which the disparity 
is defined. For motion estimation only the lowest scale has been used, since the other 
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scales do not contribute much information. In order to obtain as accurate a 
segmentation as possible, there has been some loss in computational efficiency. 




Fig. 2. The first four columns of each row show the stereo disparity map from a stereo image, 
and the x-motion estimation from two consecutive stereo images, respectively, for different X 
values. The last column shows the resulting estimation from combining the different scales. All 
these images correspond to the left image of the stereo pair. 

The first row of Figure 2 shows the stereo disparity map estimation from a stereo 
image for different values of X joined to the final estimation obtained by combining 
the different scales. The second row shows the x-motion map estimation from two 
consecutive stereo images. It is possible to appreciate how the use of multiple scales 
does not greatly improve the first scale estimation in the case of motion estimation. 
However, the combination of different scales proves to be very useful when the stereo 
disparity map is estimated. 



5 Experimental Results 

Experiments to estimate 3D rigid facial motion have been carried out from different 
stereo image sequences captured by a Pointgrey stereo camera (Bumblebee) watching 
an actor moving his face freely. A fixed window inside the captured images fixed the 
sub-images of interest. Our algorithm was applied to the image sequence defined by 
the sub-images. The proposed algorithm was applied on every two consecutive stereo 
images in the sequence. In order to assess the goodness of the estimation process we 
synthesized a new sequence of images by interpolating from the estimated motions 
and the original sequence. 

Figure 3 shows six sampled images to a distance of ten samples, each, of a stereo 
sequence of our examples. It can be seen how the strength and unpredictability of 
local facial motions makes it difficult to use interest points in the estimation process. 
Figure 4 shows how accurate the estimated motion for a particular sequence is. We 
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compare the norm of the difference between two consecutive images, with the norm 
of the residual calculated by the difference between an original image and its 
corresponding synthetic. The large decrease of the norm of the difference image from 
the first case to the second case, shows that the estimated motion is right and precise 
enough. We should point out that it is difficult to visualize the accuracy of the 
parameter estimation from this type of graph, but we prefer this type because it is 
much more difficult to appreciate small residual motions by comparing eye static 
pictures. 




Fig. 3. These pictures show local motions present in a standard stereo sequence. 




Fig. 4. This figure shows four graphs each of which is the norm of the gray level difference 
pixel-by-pixel from two images. The graphs Init.l-error and Init.r-error represent the case in 
which the images are two original consecutive right images and left images, respectively, of the 
sequence. The graphs Fitt_l. error and Fitt_r.error represent the case in which the two images 
are the original and synthesized one, using the proposed algorithm, for the right and left 
images, respectively. 
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6 Discussion and Conclusions 

In this paper a new approach to estimate the 3D rigid motion of a deformable surface 
is proposed. The algorithm we propose is accurate and fast enough since no more than 
3-4 linear iterations plus 2 non-linear iterations are needed for convergence. The use 
of stereo images allows us to estimate the motion without the need for external 
information. This result will allow us to use this approach to remove the rigid motion 
component from the disparity vector to estimate local deformations. Of course, in this 
latter case and for large image sequences, the accumulated error might get very large. 
In order to avoid this situation, the present accuracy of the estimated motion based on 
two images must be improved. An alternative in order to improve estimation would 
be the joint use of all images in a bundle algorithm, but this approach is inapplicable 
in time efficiency demanding applications 
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Abstract. In the literature of computer vision and image processing, 
motion estimation and image registration problems are usually formu- 
lated as parametric fitting problems. Least Squares techniques have been 
extensively used to solve them, since they provide an elegant, fast and 
accurate way of finding the best parameters that fit the data. Never- 
theless, it is well known that least squares estimators are vulnerable to 
the presence of outliers. Robust techniques have been developed in order 
to cope with the presence of them in the data set. In this paper some 
of the most popular robust techniques for motion estimation problems 
are reviewed and compared. Experiments with synthetic image sequences 
have been done in order to test the accuracy and the robustness of the 
methods studied. 

1 Introduction 

Motion estimation and image registration are important problems in computer 
vision, and much effort has been paid to solve them. Video compression, video 
processing, image mosaicing, video surveillance, robot navigation, medical imag- 
ing, traffic monitoring, . . . , are only some of the many applications where motion 
estimation and image registration techniques can be applied. In the literature 
of computer vision and image processing there are different approaches to mo- 
tion estimation, nevertheless, there are still challenging open problems to make 
solutions faster, more robust and accurate, or more general. 

The motion estimation problem can be formulated in many different ways. A 
well known way of solving it, is to approach it as a parametric fitting problem, 
where the parameters to be fitted are the motion parameters. Least squares 
provides a well-known way for parameter estimation. In general problems, least 
squares methods are based on finding the values for the parameters x that best 
fit a model to a set S of r data measurements, i.e. minimizing an objective 
function O over a set S of r observations vectors, S = {Li, . . . , L^}. 
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where x = (x^,--- ,X^) is a vector of p parameters and Li is a vector of n 
observations Li = {Lj, . . . , L"), i = 1, . . . , r. 

Least squares estimators assume that the noise corrupting the data is of 
zero mean and implicitly assume that the entire set of data can be interpreted 
by only one parameter vector of a given model. It is well known that least 
squares estimators are vulnerable to the violation of these assumptions. Robust 
techniques have been developed in order to cope with the presence of outliers in 
the data set. 

One of the oldest robust method used in image analysis and computer vision 
is the Hough transform. The Hough transform is robust to outliers and it can 
be used to detect multiples models, but it attempts to solve a continuous problem 
with a discrete method and consequently it can not produce accurate results. 
In addition, this algorithm needs high computational effort when the number of 
parameters is elevate, as in the case of using an affine model in motion estimation 
problems. 

Another popular robust technique is the Least Median of Squares 
(LMedS) method, which must yield the smallest value for the median of squares 
residuals computed for the entire data set. The use of the median ensures that the 
estimates is very robust to outliers. The main drawback is that LMedS does not 
have a closed form solution. There are methods that can obtain an approximate 
solution, but they need high computational effort. Therefore, the computational 
complexity of LMedS algorithms does not allow them to be used in global motion 
estimation problems. Nevertheless they can be used to obtain an initial estimate 
of the parameters of the dominant motion (see [1]). 

The Regression Diagnostics or ontlier rejection method [5] tries to 
iteratively detect possibly wrong data and reject them through analysis of the 
globally fitted model. This method has three steps: determine an initial fit to 
the whole set of data, using a ordinary least squares estimator; reject all data 
whose residuals exceed a threshold; determine a new fit with the remaining data 
set, and repeat. The success of this method clearly depens on the quality of 
the initial fit. Many improvements can be added to this method. For instance, 
estimate the initial fit using robust statitistics [3] or add an additional step that 
collect inliers between the outliers previously rejected [4]. 

Robust statistics, also called M-Estimators, is one of the most popular 
robust techniques. M-Estimators try to reduce the effect of outliers by repacling 
the square residuals in Equation 1 by a kernel function p, as follows: 

0=Y.p{e.), (2) 

Lies 

where p{ei) is a symmetric, positive-definite function with a unique minimum 
at zero and = Fi{x,Li). If p{ei) = ef, it is the least square estimator. To 
analyze the behavior of an estimator, the Hampel influence function ipi^) = 
can be used. For least squares estimator ^(e) = 2e, i.e. the influence of the 
outliers increases linearity and without bound. For a comprehensible study of 
the performance of M-Estimators see [8] . In order to solve the robust estimation 
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problem an iterative reweighted least squares (IRLS) technique is used. The idea 
of the IRLS is to assign weights Wi to the residuals at each observation Lj , where 
the weights control the influence of the observations in the global estimation. 
High weights are assigned to “good” data and lower heights to outlying data. 
The M-Estimator problem is converted into a equivalent weighted least squares 
problem as follows: 

Lies Lies 



To minimize we derivate both sides and set them equal to zero, then the 
following expression is obtained for each wf. 



Wi = 



'0(e») 



( 4 ) 



Gradient weighted least squares (GWLS) [8] techniques can be also used 
in order to achieve robustness to outliers. GWLS technique divides the original 
function by its gradient with respect to the observation in order to obtain a 
constant variance function. The solution of the GWLS problem can be also 
obtained using a IRLS technique replacing the weight function by: 



Wi = 



1 



— l..n 




( 5 ) 



In real motion estimation problems many of the previous robust techniques 
can be combined in order to deal with their problems. For instance, in [3] robust 
statistics. Hough transform and outlier rejection techniques are combined; in [1] 
LMedS and outlier rejection techniques are combined. 

In this paper, four robust motion estimation algorithms are compared, three 
of them use a linear least squares estimator in order to estimate the motion 
parameters, and each of them make use of a different robust technique in order 
to cope with outliers: M-Estimators, Gradient Weighted and Outlier Rejection. 
These algorithms are explained in the Section 2.1. The last algorithm uses a 
non-linear least squares estimator and a gradient weighted-based technique to 
cope with outliers. It is explained in the Subsection 2.2. Experiments with syn- 
thetic image sequences have been done in order to show the performance of the 
algorithms explained. They are shown in the Section 3. 



2 Robust Motion Estimation Algorithms 

In motion estimation problems, the objective function O is based on the assump- 
tion that the grey level of all the pixels of a region R remains constant between 
two consecutive images in a sequence (Brightness Gonstancy Assumption) . Using 
the BGA the objective function is expressed as follows: 

Obca= ihix'iiV'i) - h{xi,yi)f , 

{xi,yi)eR 



( 6 ) 



Robust Techniques in Least Squares-Based Motion Estimation Problems 



65 



where is the grey level of the first image in the sequence at the trans- 

formed point a;',?/', and l 2 {xi,yi) are the grey level of the second image in 
the sequence at point Xi,yi. Here, for each point i (i = with r being 

the number of pixels) the vector of observations Li has three elements (n = 3), 
Li = {xi,yi, l 2 {xi, yi)). The vector of parameters y depends on the motion model 
used. 

The BCA can not be directly used using an ordinary least squares (OLS) 
technique since it is not linear. The well-known solution to this problem derives 
the optic flow equation as the function to be minimized. Using the optic flow 
equation the objective function is expressed as follows: 

Oof = {It + Uxlx + Uyly)^ , (7) 

Xi,yi£ii 

where Ix,Iy and It are the spatial and temporal derivates of the sequence. 

Nevertheless, it is possible to directly use the BCA using a non-linear least 
squares-based estimator. Generalized least-squares methods (GLS) [4] are an 
interesting approach to extend the applicability of least-squares techniques (e.g., 
to non-linear problems). GLS techniques can be successfully applied in motion 
estimation related problems [6,7]. 

2.1 OLS-Based Robust Motion Estimation Algorithms 

Motion Estimation. The solution of the motion estimation problem using an 
ordinary least squares method uses Taylor series expansion that produces the 
well know optic flow equation. The solution of Oqf (Equation 7) is now obtained 
setting to zero the derivates with respect to each of the parameters of the motion 
model, and solving the resulting system of equations. The solution is obtained 
by solving the overdetermined linear equation system A\ = d using the closed 
solution X = (A‘A)“^A*(i, where for affine motion x is x = (oi, 6i, ci, 02 , & 2 , C 2 ), 
and A (r X 6) and d (r x 1) are expressed as follows: 

A = {Ai, A 2 , ■ ■ ■ , Ar)^ d = {d\,d2, ■ ■ ■ ,dr)^ 

Ai = {xlx, ylxj Ixi xly, yly, Iy)(^ix6) “ {xlx + yly ^i)(ixl) 

The OLS-based motion estimation is accurate only when the frame-to-frame 
displacements due to the motion are a fraction of pixel. The accuracy of the 
estimation can be improved using an iterative alignment procedure and a multi- 
resolution pyramid, see [2] for details. We will refer to this algorithm as Hierar- 
chical and Incremental Ordinary Least Squares (HIOLS). In order to cope with 
outliers a IRLS-based technique is used. For the sake of clarity, the IR.LS process 
is described in 4 steps: 

1. Greate a diagonal matrix of weights W with dimensions r x r. Each Wi 
measures the influence of the observation Li in the global estimation. Set all 
Wi = 1. 
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2. Estimate the motion parameters using the equation: x = {A^W A)~^ A^Wd. 

3. Improve the weights using the parameters previously estimated. 

4. Repeat until some termination condition is met. 

This process is integrated in the HIOLS algorithm improving the weights after 
each parameter estimation is performed. Three robust techniques are studied: M- 
Estimators (R-HIOLS algorithm), Gradient Weighted-based (GW-HIOLS) and 
Outlier Rejection (OR-HIOLS). 



R-HIOLS. The weights are calculated using the Huber M-Estimator as follows: 



Wi = 



1.0 if |ei| < k 



(9) 



GW-HIOLS. The weights are calculated using the Equation 5. 



OR-HIOLS. In the outlier rejection technique the outliers do not have influence 
in the estimation of the parameters. Now, the weight are set to 1 for inliers and 
to 0 for outliers. The threshold is calculated using a scale measure s(%) based 
on the median of the residual as follows: 

s(x) = 1.4826 * median(|ei — median(ei)|). (10) 

The scale estimated is used to reject outliers, = 0 if > s(x), i.e. the 
observation i is considered as outlier. On the other hand, it is considered as inlier 
and Wi = 1. Other similar scale measures can be used ([1], [4]). 



2.2 Generalized Least Squares 

The Generalized Least Squares (GLS) algorithm is based on minimizing an 
objective function O (see Equation 1) over a set S' of r observation vectors, 
S = {Li,... ,Lr}. In general, this equation can be non-linear, but it can be 
linearized using the Taylor expansion and neglecting higher order terms. This 
implies that an iterative solution has to be found. At each iteration, the algo- 
rithm estimates Ax, that improves the parameters as follows: Xt-\-i = Xt + Ax. 
The increment Ax is calculated (see [4]) using the following expressions: 

Ax= {A^{BB’^)-^A)~^ A^{BB'^)-^W w, = -F,{xt,U) 
p fdFi{xt,Li) dF,{xt,L,)\ (dFi(xt,Li) dF,{xt,L,)\ 

dL] ’■■■’ dL- dx^ dyP 



B = 


/Hi 0 0 0 \ 

0 R2 0 0 


A = 


(aa 

^2 


1 ^ = 


W2 




0 0 0 Brj 


(rx(rxn)) 


\Ar / 


(r xp) 


\Wr J 
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In our motion estimation problems the objective function is Obca (see Equa- 
tion 6). Here, for each point i (i = \ . . .r, with r being the number of pixels) the 
vector of observation Li = {xi,yi, l 2 (xi, t/i)) has three elements (n = 3): column, 
row and grey level of second image at these coordinates. The affine motion model 
is used in this work, which is able to cope with translations, scaling, rotation 
and shear of images and it is defined with a vector of y = (oi, 6i, ci, 02, 62, C2), 
{p= 6). Therefore, Bi, Ai and Wi are expressed as follows: 



Bi — {ail^ + 02/^ - + & 2 -fy - 

Ai = {Xil^,yil^, l^,Xily,yily^ ly^ Wi = — (/i(Xj,2/j) 



h{xi,yi)) 



(12) 



where 1].^ ly, are the gradient of first image at the pixel (a:',?/') in x and y 
direction, and ly, are the gradient of second image at the pixel (xi,yi) in x 
and y direction. 

Similarly to OLS estimator, a multi-resolution pyramid is used in order to 
cope with large motion, but the iterative nature of the GLS estimator makes 
unnecessary the use of the alignment process of the HIOLS algorithm. We name 
this algorithm: Hierarchical Generalized Least Squares (HGLS) (see [7] for de- 
tails). The robustness of the algorithm is obtained through the matrix B^B 
which can be viewed as a matrix of weights. Glearly the HGLS algorithm uses a 
gradient weighted-based technique in order to cope with outliers. 



3 Experimental Work 

In order to test the accuracy and robustness of the proposed methods two syn- 
thetic experiments have been carried out. In the first experiment, 100 trans- 
formed images have been created using random values of the affine parame- 
ters between the limits: 01,62 S [0.85,1.15], 02,61 G [0.0,0.15] and ci,C2 G 
[—10.0,10.0]. The reference image and an example of a transformed image are 
showed in Figure l(a, b). Table 1 shows the averages of the differences be- 
tween the real values and the estimated values for the affine parameters, for 
each method. 



Table 1. Error in the estimation of the motion parameters for the first experiment. 



Algorithm 


fli 


61 


Cl 


ffl2 


^2 


C2 


R-HIOLS 


9.7E-07 


1.3E-06 


0.0001 


9.8E-07 


9.1E-07 


0.0011 


GW-HIOLS 


2.8E-06 


2.2E-06 


0.0003 


1.6E-06 


2.8E-06 


0.0003 


OR-HIOLS 


0.0002 


0.0001 


0.0043 


4.4E-06 


6.0E-06 


0.0013 


HGLS 


2.8E-05 


3.9E-05 


0.0052 


7.9E-05 


6.4E-05 


0.0103 



The second experiment have been done in order to test the robustness of 
the algorithms. For this purpose, a patch of 150 x 150 pixels is added to the 
reference image and a patch of 100 x 100 pixels is added to the transformed 
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(a) (b) 




(d) (e) (f) 



Fig. 1. Test image sequences: column, reference images, 2°" column, transformed 

ones. 3’’^* column, likelihood images: dark grey values for low likelihood, i.e. outliers 



images. The patch undergoes a different motion than of the background. The 
reference image and an example of a transformed image are showed in Figure l(d, 
e) . The averages of the differences between the real values of the background and 
the estimated values are shown in Table 2. All the methods accurately extract the 
motion of the background, i.e. the pixels belonging to the patch are considered 
as outliers, and therefore, they have not influenced the estimation of motion of 
the background. 



Table 2. Error in the estimation of the motion parameters for the second experiment. 



Algorithm 


Ol 


bi 


Cl 


ffl2 


^2 


C2 


R-HIOLS 


1.8E-06 


2.5E-06 


0.0003 


2.8E-06 


1.2E-06 


0.0003 


GW-HIOLS 


7.5E-05 


4.8E-05 


0.0069 


1.4E-05 


2.9E-05 


0.0057 


OR-HIOLS 


0.0002 


0.0001 


0.0167 


7.8E-05 


6.5E-05 


0.0079 


HGLS 


5.6E-05 


5.1E-05 


0.0065 


6.7E-05 


7.7E-05 


0.0096 



The results obtained for the experiments show that all methods obtain ac- 
curate estimation of the motion parameters of the dominant motion present in 
the sequence, even in the case of high number of outliers as in the second exper- 
iments. No significant differences among the methods can be found. However, 
the results show the benefits of using a HGLS technique since it can obtain es- 
timates as accurate as the other methods and it is more general and simpler. 
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mainly due to the fact that it does not need the alignment process, which can 
introduce unexpected errors and can increase the processing time, specially in 
the case of large images. 

In order to illustrate how the outliers have been correctly rejected, a likeli- 
hood image have been created. For each pixel, the likelihood measure L(x->Pi) = 

e? 

g-0.5*^ (see [3]) of the pixel pi belonging to the model estimated with pa- 
rameters X is calculated. Light grey values are used to represent high values of 
L{x,Pi) in the likelihood image. On the other hand, dark grey values are used 
for low values of L{x,Pi), i.e. for outliers. Figures l(c,f) show an example of 
the likelihood image for the samples of the experiments. They have been cre- 
ated using the HGLS algorithm, but similar results would are obtained using 
the other algorithms. They show how the outliers have been correctly detected 
and rejected. 

4 Conclusions 

In this paper, four robust least squared-based motion estimation techniques have 
been explained, implemented and compared. They use M-Estimators, gradient 
weighted and outliers rejection techniques in order to achieve robustness in the 
estimation of the motion parameters. 

The performance of the four algorithms have been tested using synthetic 
image sequences with the presence of outliers. The four methods obtain accurate 
estimations of the dominant motion, even in the case of an elevate number of 
outliers. No significant differences among the methods were found. However, the 
results show the benefits of using a HGLS technique since it can yield estimates 
as accurate as the other methods while it is more general and simpler, mainly 
due to the fact that it does not need the alignment process which can introduce 
unexpected errors and can increment the processing time, specially in the case 
of large images. 

References 

1. Alireza Bad-Hadiashar and David Suter. Robust optic flow computation. Interna- 
tional Journal on Computer Vision, 29(l):59-77, 1998. 

2. J.R. Bergen, P.J. Burt, R. Hingorani, and S. Peleg. A three-frame algorithm for 
estimating two-component image motion. PAMI, 14(9):886-896, September 1992. 

3. M. Bober and J. V. Kittler. Estimation of complex multimodal motion: An approach 
based on robust statistics and hough transform. IVC, 12(10):661-668, December 
1994. 

4. G. Danuser and M. Strieker. Parametric model-fitting: From inlier characterization 
to outlier detection. PAMI, 20(3):263-280, March 1998. 

5. M. Irani, B. Rousso, and S. Peleg. Computing occluding and transparent motion. 
IJVC, 12(1):5-16, February 1994. 

6. R. Montoliu and F. Pla. Multiple parametric motion model estimation and segmen- 
tation. In ICIPOl, 2001 International Conference on Image Processing, volume II, 
pages 933-936, October 2001. 



70 



R. Montoliu and F. Pla 



7. R. Montoliu, V. J. Traver, and F. Pla. Log-polar mapping in generalized least-squares 
motion estimation. In Proceedings of 2002 lASTED International Conference on Vi- 
sualization, Imaging, and Image Processing (VIIP’2002), pages 656-661, September 
2002 . 

8. Z. Zhang. Parameter-estimation techniques: A tutorial with application to conic 
fitting. Image and Vision Computing, 15(l):59-76, 1997. 



Inexact Graph Matching for Facial Feature 
Segmentation and Recognition in Video 
Sequences: Results on Face Tracking 



Ana Beatriz V. Graciano^, Roberto M. Cesar Jr.^, and Isabelle Bloch^ 

^ Department of Computer Science, IME, University of Sao Paulo. Sao Paulo, Brazil. 

{cesar,abvg}@ime. usp.br 

^ Signal and Image Processing Department, CNRS UMR 5141 LTCI, Ecole Nationale 
Superieure des Telecommunications. Paris, France. 

Isabelle .BlochSenst .fr 



Abstract. This paper presents a method for the segmentation and 
recognition of facial features and face tracking in digital video sequences 
based on inexact graph matching. It extends a previous approach pro- 
posed for static images to video sequences by incorporating the temporal 
aspect that is inherent to such sequences. Facial features are represented 
by attributed relational graphs, in which vertices correspond to different 
feature regions and edges to relations between them. A reference model 
is used and the search for an optimal homomorphism between its corres- 
ponding graph and that of the current frame leads to the recognition. 



1 Introduction 

This paper deals with segmentation and recognition of facial features in digital 
video sequences through the use of an inexact graph matching technique. The 
proposed technique constitutes a first approach to the generalization of the me- 
thodology developed in [3,4] for facial feature segmentation and recognition in 
static images. 

This extension is motivated by the fact that the subject of face analysis and 
recognition arises in various computer vision applications involving human acti- 
vity recognition, such as affective computing, surveillance, teleconferencing and 
multimedia databases. Since many of these involve video sequence processing, 
it is also interesting to incorporate the notion of motion-based recognition [8] in 
the methodology. 

The main idea is to model the target facial features in a given face through 
an attributed relational graph (ARG), a structure in which vertices represent the 
facial features and their attributes, while edges represent spatial relations among 
them. The model image is manually segmented into the facial features of interest 
and relations are computed to derive a model ARG. In each input image where 
recognition has to be performed, i.e. each frame, its gradient is extracted and a 
watershed algorithm is applied to it. Then, an input ARG is obtained from the 
resulting oversegmented image as for the model. The recognition step relies on an 



A. Sanfeliu and J. Ruiz-Shulcloper (Eds.): CIARP 2003, LNCS 2905, pp. 71—78, 2003. 
(c) Springer- Verlag Berlin Heidelberg 2003 



72 



A.B.V. Graciano, R.M. Cesar, and I. Bloch 



inexact graph matching procedure that finds a suitable homomorphism between 
the graph obtained from the model and the one obtained from the image. 

The technique of inexact graph matching has been extensively studied in 
several different domains [2,9,10], such as pattern recognition, computer vision, 
cybernetics, among others. This approach is justified here due to the difficulty 
in finding an isomorphism between the model image graph and the input image 
one: since the latter represents an oversegmented image, it is not possible to 
expect a bijective correspondence between both structures. 

It is worth noting that the term facial feature recognition used hereby means 
that each facial feature of interest will be located and classified as such. There- 
fore, it is not related to the recognition performed as a means of matching a face 
against a known database of faces for instance (no face recognition is performed). 
Based on [3] where the static methodology is introduced and on [4] where the 
optimization of the graph matching process is addressed using several methods, 
the main contribution of the present work is to develop a methodology that can 
be applied to video sequences, i.e. incorporating the temporal dimension. 

This paper is organized as follows. Section 2 explains how a face is modeled 
as an attributed relational graph. Section 3 explains the inexact graph matching 
step of the methodology. Section 4 shows how the tracking process is performed 
throughout the video frames. Section 5 presents some obtained results and con- 
clusions. 



2 Face Representation 

Attributed Relational Graphs. In this work, a directed graph will be denoted by 
G = {N, E), where N represents the set of vertices of G and E C N x N the set 
of edges. Two vertices a, b of N are said to be adjacent if (a, b) G E. When each 
vertex of G is adjacent to all others, then G is said to be complete. Furthermore, 
|iV| denotes the number of vertices in G, while \E\ denotes its number of edges. 

An attributed relational graph (also referred to as ARG) is a graph in which 
attribute vectors are assigned to vertices and to edges. Formally, we define an 
ARG as G = {N ,E, where N represents the set of vertices of G and 
E C N X N the set of edges. Furthermore, p \ N ^ Lm assigns an attribute 
vector to each vertex of G, while v ■. E ^ Le assigns an attribute vector to each 
edge in G. 

The structure of a face can be thought of as being a collection of features 
(e.g: lips, eyebrows, nostrils, chin) which are somehow related in terms of their 
relative positions on the face. In the proposed model, facial feature regions are 
represented by vertices in a graph, while relations between them are represented 
by edges. The attribute vectors p and v may also be called object and relational 
attributes, respectively. The former refers to connected regions in the image and 
the latter to the spatial arrangement of the regions. 

Attributes. The object and relational attributes convey the knowledge about 
faces to the ARG structure. The attributes which have been considered in this 
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work are the same as in [ 3 ]. Let us consider an ARG G = {N, E, ii,!/) and any 
two vertices a, 6 in N. 

The object attribute fi{a) is defined as: 

^l{a) = {g{a),w{a)J{a)). (1) 

The term g(a) corresponds to the average gray-level of the image region asso- 
ciated to vertex a, whereas w{a) is a coefficient obtained from the application 
of a Morlet wavelet. Both g(a) and w{a) are normalized between 0 and 1 with 
respect to the maximum possible grey-level. Finally, l{a) is a region label. 

The relational attribute i^{a, b), for a, b in E, is defined as: 

iy{a, b) = {it, sym{a, 6 )). ( 2 ) 

The first attribute is the vector it = {ph — Pa) /‘^dmax, where dmax is the maxi- 
mum distance between any two points of the input graph, while Pa and pb denote 
the centroids of the image regions to which vertices a and b correspond. The term 
sym{a,b) denotes a refiectional symmetry calculated as described in [ 1 ]. 

The Face Model. A face model image is used as a reference to recognize facial 
features of interest. This image can be for instance the first frame of a given 
video sequence. It is manually segmented into facial feature regions of interest 
and the landmark of each region is calculated. Then, the corresponding ARG is 
derived. 

The model graph should contain vertices associated to each target facial 
feature region (e.g. lips, iris, eyebrows, skin). However, if a single feature presents 
considerable variability within its domain, it might need to be subdivided into 
smaller sub-regions, so that the averages considered when calculating both vertex 
and edge attributes can be more representative. 

3 The Facial Feature Recognition Process 

Graph Flomomorphism. Gonsider two ARGs Gi = {N\, Ei, ^1,1/1) derived from 
the image and G2 = {N2, £^2 , M2 , ^^2 ) derived from the model. They will be called 
input and model graphs respectively. Also, subscripts will be used to refer to 
vertices and edges in each graph, e.g. a\ € A^i is a vertex in G\, (02, &2) G E2 
is an edge in G2. It is also important to notice that, since G\ results from an 
oversegmented image, |A^i| is much greater than |A^2| in general. 

An association graph Ga between Gi and G2 is defined as the complete graph 
Ga = {Na, Ea), where Na = Ni x N2 and Ea = Ei x A2. 

A graph homomorphism h between Gi and G2 is a mapping h: Ni — > N2 such 
that Voi G A^i,V 6 i G A^i, (ai,&i) G Ai {h{ai) , h{bi)) G A2. This definition 
assumes that all vertices in Gi should be mapped to G2. 

Finding a homomorphism between Gi and G2 is essential to the face feature 
recognition process. Since |A^i| is greater than IA2I, a suitable homomorphism 
between the input and model graphs should map distinct vertices of G\ into a 
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single vertex of G2, which corresponds to merging coherent sub-regions in the 
input oversegmented image. 

As proposed in [3] , a solution for finding a homomorphism between Gi and 
G2 may be defined as a complete sub-graph Gs = {Ns, Es) from the association 
graph Ga, in which Ns = {(ai, 02), oi € Ni,Q2 £ fV2} such that Vai £ iVi, 3o2 £ 
A^2, (01,02) £ Es, and V(ai,a2) £ A5, V(ai', 02') £ Es, ai = ai ^ 02 = 02', 
assuring that each vertex from the input graph corresponds to exactly one vertex 
of the model graph and \Ns\ = |fVi|- It should be clear that such a solution only 
considers the structures of Gi and G2, and that it gives rise to many possible 
homomorphisms between both graphs. 



Objective Function. In order to evaluate the quality and suitability of a given 
homomorphism between the input and model graphs, an objective function must 
be defined. It should consider not only the structure of the graphs, but also the 
attributes of the facial features and their relations. In this paper, the assessment 
of a certain homomorphism is accomplished through the minimization of the 
following function: 



f{Gs) 



— E civ(ai,a2) + 

(ai ,a2)£Ns 



(i-«) 



E 

e^Es 



(3) 



where cjv and ce are dissimilarity measures given as follows: 



' lN\gi{ai) - 52(02)1 + (I - 7 iv)|wi(ai) - ^2(02)1, ' 



CAr(oi, 02) 



if /(fli) = 1(02) 






(4) 



00, otherwise 

V ’ 



Oi?(c) — E (1 ^ sym 

and (f>v and 4 >sym are defined as 



(5) 



= 7«lll^i|| - 11^2111 + (1 - 

(j>sym=\sym,{ai,bi) - sym{a 2 ,b 2 )\ ■ 



( 6 ) 



In this case, cos 9 = || values 1e and are weighting 

parameters. 



Performing Inexact Graph Matching. Many approaches have been explored for 
optimizing inexact graph matching for pattern recognition purposes, such as 
those mentioned in [3], [7] and [10]. 

In this work, the matching is achieved through the use of a tree search algo- 
rithm. Other possible alternatives include genetic and estimation of distribution 
algorithms [3]. 
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In general terms, the tree-search optimization algorithm builds a search-tree 
where each vertex represents a pair of vertices (fc, 1), k G Ni and I G N 2 - The root 
vertex is labelled (0, 0) and it is expanded in |iV 2 | sons labelled (1, Oj I = 

At each step k of the algorithm, the son which minimizes the objective function, 
say (kjrnin) is chosen and therefore expanded in |iV 2 | sons (k + l,i), i = l...|A^ 2 |- 
The process is repeated until a vertex (|A^i|, /) is reached, which guarantees that 
all vertices of Gi have been assigned to a vertex of G 2 , thus establishing a 
homomorphism between the input and model graphs. 



4 The Tracking Process 

In this section, we aim at generalizing the previous approach to video sequences. 
Since digital video is composed of a sequence of images which change over time, 
it is needed to incorporate in the methodology this temporal aspect and transi- 
tions between images, reflecting facial feature changes throughout the video (e.g. 
a progressive smile or a blink). In this section, we present our first approach to- 
wards reflecting such changes in the facial features. 

General Scheme. The overall sequence of steps performed in order to segment 
and recognize the facial features of interest in a generic frame of the input video 
sequence is illustrated in Figure 1. 




Fig. 1. Overview of the tracking process. 



Initially, approximative landmarks of the target facial features are located in 
the first frame for future constraint on the region in which the oversegmentation 
will be performed. They are obtained through the use of the Gabor Wavelet 
Network (GWN) [6]. Then the previous algorithm for static images is applied in 
the regions of the face around the landmarks. 

One of the main contributions in the methodology is related to updating the 
landmarks which will be used in the subsequent frame in the video sequence, thus 
avoiding the need for a global face tracking procedure (i.e. GWN in our specific 
case) in addition to the graph matching. If the same landmarks were applied to all 
frames, the matching and recognition results would possibly not be satisfactory. 
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since the features in each frame usually have considerable differences in terms of 
their positions. 

Landmark Updating. The GWN technique could be applied to each frame of the 
sequence in order to update the landmarks. However, it would be interesting 
to make use of the information obtained directly from the graph methodology 
and the model image. To accomplish this, an ajfine transformation is applied in 
order to map the model image to the frame under consideration based on the 
recognized facial features, allowing the landmark updating. 

For the first frame, the model landmarks, which have been obtained as ex- 
plained in Sect. 2, are also used as landmarks for that frame. For the subsequent 
frames, once the recognition procedure is finished, the centroids of the facial 
features of interest are calculated. Also, the centroids of the pre-defined regions 
in the model are calculated. Then, the affine transformation that best maps the 
model set of centroids to that of the considered frame is estimated and applied 
through the following formula [5]: 

= a{A-t + 1}) (7) 

where A corresponds to a 2 x 2 non-singular matrix representing the sought 
transformation, a is any scalar value, and if, "s^ are the centroid-coordinate 
vectors for the frame and model respectively. 

This affine transformation allows us not only to update the input face land- 
marks to be applied to the following frame in the process, but also to project 
the model image onto the segmented and recognized target frame, conveying a 
visual assessment of the matching process. 

Possible Extensions. Although this change in the methodology already makes it 
more robust for the application in video sequences, our ongoing research aims 
at making better use of the possibly redundant information present in distinct 
frames. 

One possible approach is to insert temporal edges to the set E of an ARG 
G. These edges would represent the transitions and relations among vertices 
of consecutive frames in the sequence. Through this, it would be possible to 
recalculate both vertex and edge attributes and a model image could be no 
longer needed for the recognition. Also, features which were not present in the 
first frame could be added to the recognition process on-the-fly. 

Furthermore, the results obtained in the graph matching procedure for, say, 
frame i could be reused as the initial solution for the matching step in frame i + 1, 
thus reducing the tree expansion and taking into account the smooth changes 
presented in frame transitions. This generalization belongs to our ongoing re- 
search. 

5 Results and Conclusion 

In this section we show some of the first results obtained from the application of 
the new steps introduced in the previous section. For the tests, different video 
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sequences were considered, such as sequences of male and female faces with 
static or changing background. All sequences presented considerable changes in 
the face (e.g. smiles, head movement, blinking) throughout time. 

Figure 2 depicts the results obtained for the frame-to-frame projection of 
the model-mask onto the corresponding target frames. The video sequence was 
composed of 96 color frames of size 512 x 512 which have been converted to 
grey-level images for the purpose of the algorithm. As it can be seen, the model 
mask is successfully matched up to the face, thus allowing it to be tracked along 
the video sequence. The facial features defined by the mask are approximately 
matched up to their correspondents in the image, though some mismatched 
regions (mouth in the last image) may be noted due to the global nature of the 
affine transform and to differences in the facial expressions among the frames. 




Fig. 2. Model masks superimposed on successive target frames using the recognized 
facial features. 



In terms of the results obtained by the advances proposed in this paper, i.e, 
the landmark updating and its assessment through the projection of the model 
mask onto the input face, it can be seen that the mask projection follows the 
head movements in a plausible manner. Also, most facial features which may 
be of interest are correctly tracked (e.g.: eyebrows, nostrils, nose, lips), showing 
that the recognition process and the landmark updating can be effective and 
provide encouraging results. 
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Nevertheless, certain refinements in the technique are still called for, espe- 
cially when a considerable sudden change is present between frames, or when 
unknown facial features, i.e. those which were not present in the model, ap- 
pear throughout the sequence. In such cases, the unknown facial features will be 
necessarily mapped to one of the classified facial features, which might lead to 
results such as the one seen in the frames of Figure 2 where a smile occurs. 

Thus, in this paper we have proposed a first approach towards the genera- 
lization of the methodology presented in [3]. The first results have shown that 
it is possible to reflect the changes in facial features in each frame that occurs 
throughout time using appropriate affine transformations. Although the intro- 
duced steps have provided encouraging results, the other possibilities mentioned 
in Sect. 4 are being considered in our ongoing work. 
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Abstract. In the last three years NASA and some other Space Agencies have 
draw some interest to date Mars surface, mainly because the relationship 
between its geological age and the probable presence of water beneath it. One 
way to do this is by classifying craters on the surface attending to their degree 
of erosion. The naive way to solve this problem would let a group of experts 
analyze the images of the surface and let them mark and classify the craters. 
Unfortunately, this solution is unfeasible because the number of images is huge 
in comparison with the human resources any group can afford. Different 
solutions have been tried [1], [2] over this period of time. This paper offers an 
autonomous Computer Vision System to detect the craters, and classify them. 



1 Introduction 

In the past three years a number of studies under the supervision of NASA and some 
other Research Institutes and Space Agencies have been done to date the geological 
age of some celestial bodies. An important number of works over this issue center on 
Mars surface, mainly because of the relationship between its relative geological age 
and the probable presence of water beneath the surface. 

One way to assign a planet or a moon its relative geological age is by dating craters 
on it. The first naive approach one might think to solve this problem would dedicate a 
group of experts analyze a series of images from Mars surface and let them classify 
the craters that appear on them. Nevertheless there are not enough human resources 
available to dedicate a team to do this. In order to solve this problem, a number of 
different solutions have been studied. One project, under the name of clickworkers, 
proposed by NASA investigators [1] put a group of grayscale images from the NASA 
database of Viking Orbiter Mission to Mars on the Internet (for a further look see 
http://clickworkers.arc.nasa.gov/top ). The project let anyone who was willing to help, 
after doing a very basic instruction, signal the position (mark) and classify craters 
within a series of images that are presented to the collaborator, who received the title 
of cUckworker. The system presents every image it has, not to one, but to many 
clickworkers who give their opinion about the position and class the craters have. The 
system collects these opinions and obtains the consensus of them. Finally it colors a 
map of Mars based on these information. Some of the results obtained from the 
clickworkers project (CP) were quite similar when compared with the solutions of the 
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experts. The problem with this project is that it still depends on the human factor. 
Some projects have tried to automate this labor, being one example the work done by 
Negrete [2]. On his work, Negrete gave an autonomous system based basically on the 
Hough Transform to detect and mark craters over an image, and after that, the system 
used ontologies to classify the craters that were detected. Sadly, this investigation was 
not able to detect more than 60% of the craters. The purpose of this paper is to offer 
yet another computer vision system [3] for the marking and classification of the 
craters using a number of different image processing techniques [4] and some others, 
like Fuzzy Logic [5]. A scheme of the proposed system is presented in the following 
figure. 




Fig. 1. Scheme of the system proposed for crater marking and classification 

A detailed explanation of the system and the results obtained from it are presented 
in the following sections. 



2 Computer Vision System 

The system proposed here is very simple, and it was intended to be this way, since the 
algorithm should be as fast as possible so it can manipulate a large volume of data. 
That is why a number of processes like equalization of the image, mathematical 
morphology and clustering among others are omitted at this point. In spite of the 
system’s simplicity, the results obtained were in general satisfactory. Moreover, some 
techniques were ruled out because the system was not truly enhanced when these 
were added. For example, equalization was omitted because the images captured by 
the satellite are almost equalized. Nevertheless, when this is not the case, the 
classification system absorbed most of these differences. In the following subsections 
a deeper description of the marking and classification subsystems will be presented. 



2.1 Crater Marking Subsystem 

Marking of the crater is a three step process (see Fig. 1). First, we must obtain the 
borders of the image. A number of different techniques are available for this matter. 
An analysis of the nature of the images suggested that the Laplacian of the Gaussian 
Method (LOG) is a good choice in this case. Some other methods were tried, among 
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them, Sobel (which is suggested by Negrete in his work [2]), but our experiments 
favored LOG in most of the cases. The better results for LOG were attained when 
using a threshold of 0.006 and a neighborhood of 2. These borders were feed as an 
input to the Hough Transform [6] to detect the circles (craters) on the image. 



Hough Transform. As already has been stated, the Hough Transform (HT) is used to 
obtain the probable locations of the craters. The general case of the HT was developed 
to detect lines, but the generalization of the Transform to detect some other geometric 
bodies is straightforward in many cases. An example of the pseudocode* for the HT 
that detects the circles within an image, given the set of coordinates of the borders, a 
radio r, and the number of samples to be taken over the perimeter of a circle would be 
as follows: 

HT_circles (borders, r, sample_rate) 

initialize (acum) 

for i 1 to length [borders] 

(x, y) <— borders [i] 

angle = 0; 

repeat 

angle = angle + sample_rate 
a 4- round (x - r * cos (angle) ) 
b 4— round (y - r * sin (angle) ) 
acum [a, b] = acum [a, b] +1 
until angle > 27t 
return acum 

Nevertheless, a number of problems arise when the HT is used for detection of a 
geometrical shape such as incomplete borders or deformed shapes. For the project, 
both things must be accounted for. One way to deal with the geometric body 
deformations is by using a variation of the HT called sliding window (HTSW). The 
HTSW uses a neighborhood around the point being analyzed, which enhances the 
local maxima stored in the accumulator. For the experiments an n by n neighborhood 
was selected with the evaluated point as its center. The value of n was obtained as a 
function of the radio (r) through the following expression: 

\2 I5<r (1) 

n = { 

[I r<15 

To solve the problem of incomplete borders, the system takes into account a 
threshold value 0 < 0 < 1 which measures the percentage needed to detect a crater. 
The best results were obtained for 9 = 0.3 . The marking process using a biased 
HTSW system is exemplified in the following figure. 



* The pseudocode follows the conventions proposed on Cormen’s book [7] 
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Fig. 2. The marking process takes an image of Mars surface (a) obtains its borders with the 
LOG (b) and then uses this borders for the HTSW (c) to detect craters of a particular radio (d) 



Filtering. Once a crater has been marked, the borders associated with this crater are 
deleted (since two different craters will have their own rims). By eliminating these 
borders, not only the process is accelerated, but this also reduces the number of false 
detections. After deletion is done there are still two problems that the system handles. 
The first problem was that some craters were detected because the region selected was 
not a crater, but the borders from a group of different geological accidents. To avoid 
this, the system sums the borders in that area. If the number of borders not used in the 
detection of the crater is greater than a threshold, the detected crater is discarded, 
unless there is clear evidence of the presence of the crater (the accumulator shows that 
at least 80% of the elements were accounted). An example of how deletion and 
filtering contributed in the detection process is presented in the following figure. 




(a) (b) (c) 



Fig. 3. Series of images showing the craters detected with the HTSW (a), detection using 
deletion of the borders used (b) and detection with deletion and filtering (c) 



2.2 Crater Classification Subsystem 

The crater classification, as stated in the CP proposes that there are three basic 
categories called fresh, degraded and ghost related with crater aging. Depending on 
the aging, each crater will have a number of features that can be looked upon. The 
definition given by CP states that “...a fresh crater displays a sharp rim, distinctive 
ejecta blanket, and well-preserved interior features (if any). In a degraded crater the 
surrounding ejecta blanket is gone, interior features are largely or totally obliterated 
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and the rim is rounded or removed. Finally, the ghost craters are faintly visible 
through overlying deposits”. The next figure shows an example of these classes. 




a) b) c) 



Fig. 4. The basic classification proposed by the CP states that there are three categories based 
on crater aging. The craters could be fresh (a), degraded (b) and ghost (c) 

A common problem with automated classification arises when the rules that define 
a category are ambiguous or there is some level of “spatial vagueness”. On these 
cases a different approach like a fuzzy logic system is needed. Fuzzy systems have 
been applied for a number of different problems of classification such as medical 
applications [8], soil classification [9], fish grading [10], etc. For the crater 
classification process, a number of problems are inherited from the ambiguity and 
vagueness of the definitions, and because of this, the use of a fuzzy system was 
considered. The system proposed in fact is a supervised fuzzy system. 



Fuzzy System. In the preceding lines, the reader learned that crater classification is 
related with a number of features to be looked upon the image. Nevertheless, for an 
untrained eye, trying to determine how recent the crater is might be equivalent to 
simply say how deep the crater is. Luckily this very simple feature proved to be 
enough for the vast majority of the cases. A simple observation related with this 
feature (how deep a crater is) is that the number of bright pixels for a fresh crater (Fig. 
5(b)) is considerably larger than the same number for a ghost crater (Fig. 5(d)). 
Another way to say this is that the distribution of the graylevels for a fresh crater 
tends to be uniform while Normal distribution is better suited for ghost craters. 




(a) (b) (c) (d) 

Fig. 5. Histograms (b), (d) of a fresh (a) and a phantom crater (c) 



The fuzzy system proposed here, will use as inputs the maximum over the first and 
the second quartiles ( max , max 02 ) *^he histogram of the square region 
containing a crater detected from the marking process. For simplicity, the fuzzy input 
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variables will be labeled after the values they receive, i.e. maxQ^ , maxQj ■ Each of 
these fuzzy inputs will have three different sets called small, medium and large; 
which were determined experimentally. As output, a single variable is used. This 
variable receives the name of category and it contains three fuzzy sets (also 
determined experimentally) called fresh, degraded and ghost. The sets of the input 
variables are trapezoidal, while the sets for the classes are triangular as can be seen in 
Fig. 6. The inputs variables are related with the category simply by an AND operator 
as reflected in the following table. 

Table 1. AND Relationship between fuzzy input variables max max fuzzy 

output variable category for the classification system 



max Q 2 

max (2i 


small 


medium 


large 


small 


degraded 


ghost 


ghost 


medium 


degraded 


degraded 


degraded 


large 


fresh 


fresh 


degraded 



For the defuzzification, the mean of maximum method was selected. The 
combination of this method together with the type of sets choose for the classes 
allowed to determine, by using a simple hard limiter nonlinear function, in which 
category the crater was. As an example, the following figure shows the results of the 
fuzzy system when it was feed with the pairs (0.0267,0.0124) and (0.0009,0.0276). 
These pairs represent the input (max Qj, max Qj ) for the craters in Fig. 5(a) and (c). 




(a) (b) 



Fig. 6. When the system was feed with the inputs for the craters shown in Fig. 5(a) (a) and the 
ghost crater shown in Fig. 5(c) (b), it correctly classified the craters 

The results obtained by this system correctly classified more than 90% of the 
craters marked. 



3 Results 

To determine the factors used for every part of the system a number of experiments 
were conducted over a set of 100 grayscale images of 256 x 256 pixels containing 
little less than 300 craters. The experiments were carried out to obtain first the 
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parameters related with the marking process. After the best set of parameters was 
determined from these results the performance of the classifier was obtained. 



3.1 Experiments for the Marking 

For the marking process, a group of 30 images containing approximately 100 craters 
were used. A number of border techniques were tested. In combination with the 
border techniques, different approaches were tested as the HT, HTSW, HTSW with 
border deletion (HTSW/D) and HTSW with deletion and filtering (HTSW/DF). The 
following table (Table 2) contains the results for some schemes that were studied. The 
first number of the combination refers to the percentage of craters detected, while the 
second number refers to the percentage of the false detections. For example, when the 
HTSW with Deletion and Filtering (HTSW/DF) was selected to detect craters with a 
threshold 9 = 0.3 using the Sobel Technique with a threshold of 0.06, 53.12% of the 
craters were detected, but 23.52% of the crater detections were false. In some cases, 
the number of false detections for a particular combination was above 50%. When 
this is the case, the data from the experiments is not presented in the table. 



Table 2. Statistcs obtained from some of the different combinations tried for crater marking 





Canny 

0.175,0.05 


Sobel 

0.05 


Sobel 

0.06 


LOG 

0.0055 


LOG 

0.006 


LOG 

0.0065 


HTSW/DF 






65.62% 




81.25% 


66.67% 


6> = 0.25 






41.83% 




43.48% 


42.11% 


HTSW/DF 




47.75% 


53.12% 


68.75% 


71.88% 


65.63% 


9 = 0.3 




46.14% 


23.52% 


42.42% 


22.58% 


19.23% 


HTSW/DF 


37.50% 


21.50% 


48.48% 


46.87% 


43.75% 


40.62% 


9 = 0.35 


38.09% 


31.17% 


11.76% 


22.22% 


14.29% 


4.17% 



It is important to point out that the output for the HT and HTSW in general 
detected more than 50% of false detections. From the results presented in the table, it 
can be said that the best ratio between true and false recognition was obtained for the 
HTSW/DF when the threshold was set at 30% and the LOG Method was choose for 
the borders with a threshold of 0.006. 



3.2 Results for the Classification 

For the classification process we let the best system (HTSW/DF and LOG with proper 
parameters) mark the craters of 100 images which contain little less than 300 craters. 
After marking was done 200 images containing craters (no false detection was 
selected) were manually classified using the CP criteria. The classification subsystem 
was feed with the images previously selected and its results were compared with the 
ones manually obtained. The results generated in this way agreed in 91.5% of the 
cases. 



86 



A. Flores-Mendez 



4 Conclusions and Further Work 

The time used by the system to mark and classify the craters (less than 5 seconds per 
image in average using a Pentium 4, 2 GHz, 512 MB RAM computer) makes it 
possible to think that this process can be used in practice. Nevertheless, if the process 
is to be useful, the marking subsystem needs to be further developed to obtain at least 
an 85% of recognition while the false recognition is kept below 7%. There are some 
signals from some research lines being studied at this moment that both percentages 
can be achieved, but further work will be needed. Sadly, recognition close to 100% 
has already been discarded, because the recognition of some “hard to identify” craters 
(usually phantom craters) implies almost certainly an increment in the number of false 
recognitions. For the fine tuning of the detection, is probable that Mathematical 
Morphology will be helpful. Still, more experiments in this sense are to be 
accomplished. 

On the other hand, the classification subsystem proved not only to be a good 
classifier but also it was very robust because it was capable of correctly classify 
craters even when the area contained only partially the crater, the area was bigger than 
the one from the crater or the area contained some other elements besides the crater. 
The success of this part of the system, in my opinion, resides mainly on the selection 
of the histogram based feature, and because of this, some other techniques different 
from fuzzy logic might be used with this feature as its input to classify the craters. 
Still, I do not believe that great improvements can be done in this line since the 
classification remain being a subjective process, and that’s why some results in this 
area might vary but not meaningfully. 
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Abstract. We present two observation models based on optical flow 
information to track objects using particle filter algorithms. Although, 
in principle, the optical flow information enables us to know the dis- 
placement of the objects present in a scene, it cannot be used directly 
to displace a model since flow estimation techniques lack the necessary 
precision. We will define instead two observation models for using into 
probabilistic tracking algorithms: the first uses an optical flow estima- 
tion computed previously, and the second is based directly on correlation 
techniques over two consecutive frames. 



1 Probabilistic Tracking 

The probabilistic models applied to tracking [1,2,3] enable us to estimate the a 
posteriori probability distribution of the set of valid configurations for the object 
to be tracked, represented by a vector X, from the set of measurements Z taken 
from the images of the sequence, p(X|Z). The estimation in the previous instant 
is combined with a dynamical model giving rise to the a priori distribution in 
the current instant, p(X). The relation between these distributions is given by 
Bayes’ Theorem: 



p(X|Z)cxp(X).p(Z|X) 

The distribution p(ZjX), known as the observation model, represents the 
probability of the measurements Z appearing in the images, assuming that a 
specific configuration of the model in the current instant is known. 

In this paper, two optical flow based observation models are defined. The first 
one uses as evidence an existing estimation of the optical flow of the sequence, 
and the second one is based on correlation techniques. 
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2 Optical Flow Estimation 

The most well-known hypothesis for calculating the optical flow [4] assumes 
that the intensity structures found in the image, on a local level, remain approx- 
imately constant over time, at least during small intervals of time. 

There is no algorithm for estimating the optical flow field which is clearly 
superior to the others. Each may have small advantages over the others in par- 
ticular situations, but in general it may be said that from a practical point of 
view all are equivalent [5,6]. In this paper, we have preferred to use the algorithm 
in [7], for the following reasons: 

— It does not impose restrictions on the sequence to be analyzed. 

— It provides a dense estimation of the optical flow. 

— It is designed to preserve discontinuities in the flow, which is necessary for 
the observation model proposed in this section to behave appropriately. 



3 The Dynamical Model 

Other authors have successfully used characteristics such as the gradient [8] or 
intensity distributions [3] for tracking tasks. The dynamical model of the object 
will provide an a priori distribution on all the possible configurations in the 
instant tk, p(X(tfc)), from the estimated distributions in the previous instants of 
time. In this paper, a second-order dynamical model has been used in which the 
two previous states of the object model are considered, and this is equivalent to 
taking a first-order dynamical model with a state vector for the instant tk of the 
form [8] 

df*, = [X*,_, X*J^ 

The integration of the a priori distribution p(X) with the set Z of the ev- 
idences present in each image, in order to obtain the a posteriori distribution 
p(X|Z), is obtained with Bayes’ Theorem. This fusion of information can be 
performed, if the distributions are Gaussian, using Kalman’s Filter [1]. However, 
in general, the distributions involved in the process are normally not Gaussian 
and multimodal [2] . Sampling methods for modeling this type of distribution [9] 
have shown themselves to be extremely useful, and particle filter algorithms [10, 
11,3] based on sets of weighted random samples, enable their propagation to be 
performed effectively. 



4 Observation Models 

If there is a good optical flow measurement and the object is perfectly localized, 
it is possible to slide the points of the model in accordance with the flow vectors, 
thereby obtaining a good estimate of its position for the following frame. Un- 
fortunately, the small errors in the flow will mount up with each frame, so that 
the model gradually separates from the real object, until it loses it completely. 



Using Optical Flow for Tracking 



89 



Nevertheless, it may be supposed that the object to be tracked will move in an 
environment that has other displacements, and therefore it may be assumed that 
there will be discontinuities in the optical flow on its contour -or at least part 
of it. The observation model will be defined in such a way that it not only helps 
the flow inside the object to concur with the displacement implied by the value 
of df, but also so that discontinuities in the optical flow appear in the contour 
of the model. 



4.1 Observation Model Based on Optical Flow 

Let us suppose that we have an estimation of the flow field v for the image I in 
the instant The following error function may be defined, with S Q I being 
an area inside the image: 

Zs(v;d)= ^ W{x,y)\\v{x,y) - d{x,y)f (1) 

{x,y)^S 

where W{x,y) is a weight function and d(a;, 2 /) is given by the state vector df, 
relating the point on the model in the instant tfc-i with the same point in 
the instant tk- This measurement will always be non-negative and will only be 
equal to zero when the flow vectors are perfectly adjusted to the displacement 
predicted by the model. 

Let us now consider a point x = (a;, y) of the image belonging to the outline 
of the model in the instant tk- This point will be given by the expression 

X = /(Xt,;m) 

where defines the specific configuration of the object model, and m is the 
parameter vector which associates each point within the model with a point on 
the image plane. The displacement vector can be calculated for the same point 
on the model between two consecutive instants of time as 

d(dft,,m) = /(Xt,;m)-/(Xt,_,;m) (2) 

Considering 5 as a 2D region centered at the measurement (1) 

would be: 

Zs{Xt^,m)= W{x,y)\\v{x,y)-d{Xt^,m)\f (3) 

{x,y)^S 

The flow field is expected to present discontinuities on the boundaries of 
the moving objects -otherwise, it would be impossible to locate the object only 
from the flow vectors-, which is why if we subdivide S into two areas Si and Se, 
corresponding respectively to the parts of S interior and exterior of the object 
contour. If the model’s prediction is good enough, the adjustment must be much 
better in Si than in Se, so that the point in question may be considered to 
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be placed on the contour. In order to detect that, we compute Zs- 
following measurement: 






ZsA^tk,^) + Zs^Xt^.m) 



( 4 ) 



The value of Z{Xt^,m) satisfies the following properties: 



~ 0 < Z(T’t,,m) < 1 

~ liZsAXuM Zq^ {Xtj ^ , ^ai), then Z , m) — ^ 1, which indicates that the 
adjustment is much better in Si than in 5'e, and therefore the point must 
be situated exactly in a flow discontinuity, in which the inner area coincides 
with the displacement predicted by the model. 

- lfZsAXt,,m) Zsi (Xtf^ , m), then Z{Xt^^ , m) — >• 0. The adjustment is worse 
in the inner area than it is in the outer area, and therefore the estimated 
flow does not match the model’s prediction. 

- liZsAXt.^m) = Zsi {Xtf . , m) , then the adjustment is the same in the inner 
area as it is in the outer area, and therefore the flow adequately matches the 
displacement predicted by the model, but it is impossible to guarantee that 
it is situated on a flow discontinuity -nor, therefore, on the contour of the 
object. In this case, Z{Xt^,m) = 1/2. 



Assuming that the values of Z$^ and Zs^ are bounded, it may be assumed 
that the probability of a point on the image corresponding to the point on the 
outline of the model given by the vectors Xtf. and m is proportional to Z{Xt^,m)-. 

p{Z\Xt^,m,) (X Z{Xt^,mi) (5) 



Finally, assuming statistical independence, we may obtain the expression for the 
observation model based on optical flow vectors, as the product of the values 
obtained for each individual point on the contour: 

p{Z\XiJ^l[Z{Xt„m,) ( 6 ) 

i 

with nii being the vector which identifies the i-nth point on the contour of the 
model. 

When it comes to partitioning the neighborhood S corresponding to a point 
X of the contour of the model into two halves, one (Si) inside and the other 
(5'e) outside the model, respectively, a good approximation consists in using the 
tangent to the contour in x as the dividing line between Si and Se- 

The difficulty in determining a dense flow [12] has led us to establish that 
those points with a more reliable flow measurement are of more use when calcu- 
lating the internal and external values for the measurement Z -expression (1). 
This is easily achieved by calculating the value ofW{x, y) in this expression from 
any of these reliability measurements of the calculated flow. In the experiments 
carried out for this paper, the magnitude of the intensity gradient has been used. 



W{x,y) = \\WI{x,y)\\ 
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4.2 Observation Model Based on Similarity Measures 

In the model defined in this section, in order to estimate the observation proba- 
bility of each point of the contour, similarity measurements shall be used. If the 
prediction which the model makes is good and the intensity maps corresponding 
to the neighborhood of each point are superimposed, the inner part of the model 
must fit better than the outer part. 

Let X = /(X(^;m) be a point belonging to the model contour at the in- 
stant tfc, let S' be a neighborhood of x subdivided in turn into Si and Se, let 
d(fFtj.,m) be calculated from expression (2), and let and be images 

corresponding to the instants of time tfe_i and tk- The quadratic errors are 
therefore calculated in the following way: 

m) = Es. M^(x)(/('=-i)(x) - /W(x - d(T’*,, m)))' 

(7) 

Zs^ (X,, , m) = Es. (x) - (x - d(T’,, , m))) ' 

where W (x) is a weighting function. Two non negative magnitudes are ob- 
tained, that may be combined using expression (4), in order to obtain a value of 
Since the magnitudes Zs^ and Zs^ are restricted, Z(df(j,,m) may be 
considered to be proportional to the observation density p(Z|ff), and therefore 
we again have: 

p(Z|T’t^,m*) oc (8) 

Supposing the measurements on each point are statistically independent, we 
can use the expression (6) to compute the final observation probability. 

5 Experiments 

The observation models proposed have been incorporated into the Condensa- 
tion algorithm [8], and its performance has been compared with that of the 
observation model based on normals as proposed in [8]. Two image sequences 
are used, lasting 10 seconds, with 25 frames per second, 320x240 pixels, 8 bits 
per band and pixel, corresponding to the movement of a hand over an uniform 
and non uniform background. 



5.1 Tracking an Object over an Uniform Background 

For modelling the hand, a contour model based on a closed spline with 10 control 
points and a Euclidean similarity deformation space were used. 

For the observation model based on contour normals, 20 normals were sket- 
ched for each sample. The observation model was applied with parameters a = 
0.025 and cr = 3, incorporated into the Condensation algorithm with 200 
samples. The initialization was carried out manually, indicating the position 
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Fig. 1. a) Results obtained with the observation model for the contour normals, b) 
Results obtained with the observation model based on optical flow, c) Results obtained 
with the observation model based on similarity measures. The distribution average 
appears in solid line in the current frame, and the averages in some previous frames 
appear in dashed line. 



of the object in the first frame. Figure l.a shows the weighted average of the 
distribution obtained. 

For our first observation model, the algorithm of [7] was used on the images 
to obtain an optical flow map between each two consecutive frames. The size 
of the area centered on each point was 5x5 pixels. As a reliability measure 
W{x^y) when it comes to weighting the quadratic differences in expression (3), 
the magnitude of the intensity gradient V/ was used on each point. 

The Condensation algorithm was applied in exactly the same conditions 
as for the previous model, obtaining the results showed in figure l.b. 

In order to apply the observation model based on similarity measures, the 
same conditions were used as in previous experiments (200 samples and 20 points 
along the contour, considering a neighborhood of 5x5 pixels for each point). The 
result obtained is illustrated in Figure l.c. 



5.2 Tracking an Object over a Non Uniform Background 

In order to use the observation model based on contour normals, 18 normals 
were sketched to each contour, and the same technique was used to detect the 
boundaries as the one used in the previous series of experiments, with a slightly 
lower threshold (0.04). The number of samples is still 200, and the parameters 
for the observation model in this case were ct = 3 and a = 0.055. The results 
can be seen in Figure 2. a. 

For the observation models based on optical flow, the algorithm in [7] was 
used once again, areas of 5x5 pixels, and W{x,y) = ||V/||. With the same 
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Fig. 2. a) Results obtained with the observation model for the contour normals, b) 
Results obtained with the observation model based on optical flow, c) Results obtained 
with the observation model based on similarity measures. The distribution average 
appears in solid line in the current frame, and the averages in some previous frames 
appear in dashed line. 



number of samples as in the previous experiment (200), and the same 18 points 
on the contour, the results (Figure 2.b) are clearly better for this observation 
model. 

For the observation model based on similarity measures, neighborhoods of 
5x5 pixels and 200 samples for the Condensation algorithm were also used. 
The results obtained are shown in Figure 2.c. 



6 Discussion and Conclusions 

The experimental results obtained by the two proposed observation models on 
the sequence with an uniform background are satisfactory, although at one mo- 
ment the distribution average strays slightly below and to the right of the hand, 
covering its shadow. This is due to the fact that, since there is no texture on 
the background, the shadow appears as a small grey patch which moves around 
with the hand, which is why the flow boundary can be placed on the contour of 
the hand-shadow set. 

In the second sequence, there were significant differences in the results ob- 
tained in the tracking according to which observation model was used. With 
the observation model for the contour normals, as there are many edges on the 
background, samples emerge with a significant likelihood value, although they 
are not placed on the object. Consequently, the distribution average strays from 
the real position of the object in some frames, although at no time does it lose 
it completely. 
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With the two new observation models, it can be seen how the model never 
loses the object, and that it is not affected by the presence of clutter outside the 
real object, since the only discontinuities in the flow map will be given by the 
contour of the hand. 

As can be seen, in a non uniform background, the observation models pro- 
posed here perform better than the model based on contour normals. In a uni- 
form background, the absence of texture means that the model based on normals 
behaves better. This suggests that the proposed model and the contour normals 
model can be considered, in some way, complementary. 

Acknowledgement. This work has been financed by the grant TIC-2001-3316 
of the Spanish Minister of Science and Technology. 
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Abstract. This paper demonstrates a technique of analysing the following three 
problems: automatic extraction of moving objects, suppression of the remaining 
errors and solution of the correspondence problem for the video sequences mo- 
tion analysis. Here we use a new paradigm for solving the correspondence 
problem and then determination of a motion trajectory based on a trisectional 
structure. I.e., firstly it distinguishes between real world objects, secondly ex- 
tracts image features like Motion Blobs and colour-Patches and thirdly abstracts 
objects like Meta-Objects that shall denote real world objects. The efficiency of 
the suggested technique for determination of motion trajectory of moving ob- 
jects will be demonstrated in this paper on the basis of analysis of strongly dis- 
turbed real image sequences. 



1 Introduction 

Video object segmentation is required by numerous applications ranging from high- 
level computer vision tasks [1], traffic monitoring [2] to second-generation video 
coding [3]. The suggested technique pursues the target of segmentation of each mov- 
ing object automatically and furthermore the determination of the motion trajectory of 
these moving objects making use of a multi feature correlation, MFC. Our approach 
differs from the other methods of the motion analysis (blockmatching, BM [4], optical 
flow [5], feature -based methods [6,7] and deformable model-based method [8]) due to 
the fact that on one hand object-adapted regions will be applied but not fixed block 
and on the other hand the colour information is evaluated. Another significant differ- 
ence of the proposed work as compared to the work reported by other researchers, is 
that instead of image-primitives (e.g. edges and comers) the hierarchical features 
extracted from the moving image regions for the solution of the corresponding prob- 
lem in image sequences. This is because of the fact that the image primitives occur in 
a large number and generally they don’t have good characteristics for the removal of 
ambiguities. In fact, a complete search is impossible for primitive-oriented concepts in 
the contemplated image. In contrast, our method will allow the extraction for the solu- 
tion of the corresponding problem in image sequences in such a way that only moving 
regions are extracted. This procedure facilitates the reduction of image data and also 
allows robust determination of motion information. 
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2 Tracking Paradigm 

In image processing the question of “what is an object or a car” is of a philosophical 
nature. One can try to find objects by searching for them with pre-defined model- 
patterns that requires previous knowledge. Naturally, no clear set of models are avail- 
able that can cover the whole spectrum of vehicles. Therefore the proposed method 
goes in a different way. 

In this paper, since the real time processing and robustness by the segmentation of 
moving object is essential for the video surveillance and tracking analysis, hence a 
modified difference-image-based (MDI) approach is used for segmentation of moving 
objects. The whole procedure allows extraction of arbitrary objects automatically. 
After extracting regions of motion, there are always some residual errors in general, 
which are normally eliminated via morphological and the separation algorithm. 
Thereby a robust segmentation of moving image regions is reached despite the change 
of lighting condition that occurs often in the real environment. The resulting image 
regions present the object candidates that are used for further tracking analysis. Here 
the tracking paradigm can be defined as “the Motion Blobs (MB) and colour patches 
(colour-structure code, CSC) representing two feature-levels in a tracking hierarchy” 
(Fig. 1). Basically, these features describe the noticeable motion in an image pair of 
two consecutive images. To find and track objects in a longer image sequence an ab- 
straction level is introduced. In this level a set of so-called Meta-Objects (MO) are 
suggested to denote real world objects. MOs are specified by a set of CSC-Patches. 
Now, the task to track MO splits into two parts. The first to do is a correlation at the 
feature level, i.e. a correlation of MO and CSC-Patches in image pairs. The second 
part is a correlation at the abstraction level, i.e. the assignment of the correlated CSC- 
Patches to the right MO. Generally the paradigm uses a trisection, i.e. it distinguishes 
between firstly the real world objects, secondly the image features like MB and CSC- 
Patches and thirdly the abstract objects like Meta-Objects that shall denote real world 
objects. The relationship between feature and abstraction level will be elucidated in 
detail after the introduction of the MB and CSC-Patch correlation. 




Feature Level to Meta-Objects Abstraction Level 



Fig. 1. The suggested technique for the automatic detection and tracking analysis 



2.1 Automatic Segmentation of the Moving Objects 

Compared with the motion-field-based methods [4] for segmenting moving objects, a 
difference image (DI) scheme represents a simple way to detect the moving objects in 
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a scene. This is because a difference image is produced quickly by simple subtrac- 
tions. Here it is to be emphasised that the transition borders between regions {i.e. 
discontinuity) will not be cleaned up by this approach. This is because some parts of 
the stationary background is detected as a moving region (Fig. 2 Dit=i). The cause is 
attributed to the fact that the background is changed behind the moving object and it 
is detected in the resulting Dit=i-difference image as movement. A consequence of that 
is that the moving image regions will merge by this background. In extreme cases the 
stationary background will be detected as a moving region. This problem is repre- 
sented in here as a motivation for a modified approach for segmentation of the moving 
objects. The principle for this modified difference image-based (MDI) approach is 
based simply on the fact that the resulting difference image will be received only if the 
extracted moving regions from two consecutive DIs are multiplied (Fig. 2). This op- 
eration takes place in accordance with Eq. 1 . Thus zero crossings will appear in the DI 
image, however, an accurate position of the moving objects (middle image) at the 
time t is 

Mdi(x, y, t)= DIt_g,t t (x, y)- DIj t+gj (x, y) (1) 

In general, a MDI approach has two good advantages. The first one is that motion 
regions on the multiplication level keeps the shape of the moving object at a time ’t’. 
Regions on a normal DI do not express the shape of the object because it is a mixture 
of the object shape on the image plane at time ’t-1’ and that at time ’t’. The other ad- 
vantage is that it is easy to detect whether the current frame contains motion informa- 
tion or not. If motion regions on a MDI approach are small or do not exist, it indicates 
that the moving object stands still and there is no need to estimate the pose in that 
frame. It should be recognised from Fig. 2 that the MDI approach, a substantial prob- 
lem isn’t solved yet. If one pixel of a moving object has coincidentally the same grey 
value as the stationary background or if these pixels do not have a texture foreground 
at this position, then the change of the intensity value will not be detected in the dif- 
ference image, although the object moved there. These pixels are missing in the dif- 
ference image, so that the resulting moving regions are occupied with holes as a result 
of this step. The goal of the following pre-processing algorithm is the suppression of 
the remaining errors {holes, outliers or fusion of regions) by the segmentation. The 
holes are, therefore, filled up first via morphologic operations. A suitable structural 
element must be defined by the application of morphologic operations. The optimal 
size of the structural element is reached through the following steps: (1) Closing- 
Dilation with a larger square structural element, in order to interconnect ranges with 
much movement (2) Closing-Erosion with a small structural element, in order to sepa- 
rate closer objects. Subsequently, a separation algorithm is applied for the separation 
of merged image regions {Removal of thin background connection between objects). 
The separation happens, if two peripheral points have an Euclidean distance d to each 
other, which is below a threshold value dmin and thereto a minimum number exists 
between the two points of n peripheral points. This algorithm is used further for the 
smoothing of the contour-region, then outliers will be removed simply and quickly 
from the previously segmented regions (Fig. 2). The resulting image is binary only 
after execution of the multiplication operation and not before. Thereby the weak 
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edges, which move fast or the strong edges, which move slowly, are considered. The 
generation of the motion mask can be accomplished, by setting these regions with a 
threshold value. After the moving image regions are extracted, the motion parameters 
for each moving region will be determined. This happens in the next step of the sug- 
gested technique. 
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Fig. 2. The segmentation of moving objects making use of the MDI approach and pre- 
processing with morphological and separation operators 



2.2 Generation of Motion Blobs and CSC-Patches (Feature Level) 

In the next step object candidates and their corresponding motion trajectories will be 
determined. This will be based on the analysis of all previously found regions of Mo- 
tion-Blobs (MB). For this task it becomes necessary to do a further segmentation of 
the MB. There are several algorithms, which are potentially more suitable for this 
purpose. The algorithm which combines most of the features required is the Colour- 
Structure-Code (CSC). It produces a stable real-time capable for segmentations and 
ease of control. The CSC is an advanced region growing approach that combines the 
pros of fast local region growing algorithms and the robustness of global methods 
[7,9]. In a region growing approach usually small segments are built first, which grow 
bottom-up in a second step of the procedure. That means the initial segments are 
merged with other segments that fulfils a similarity criterion. Common problems in 
region growing approaches are transitive errors, which emerge from local similarities. 
Through successive local similarities, distant pixels with different colours may fall 
into the same segment. Also the segmentation result depends on the order of all 
merging steps. The CSC avoids both disadvantages through the hierarchical hexagonal 
topology. It assumes colour images that have all pixels arranged in a hexagonal struc- 
ture. This can easily be simulated for conventional orthogonal images. In a hexagonal 
structure each pixel has six equidistant neighbours. Each second pixel per row and 
each second pixel per column represent the central pixel of a so-called “island of level 
0 ”, which consists of exactly this central pixel and six neighbours. Also, all pixels but 
the central pixels belong to exactly two islands of level 0. In general, each fourth is- 
land of level i represents the centre of an island of level i+1, which itself consists of 
seven islands of level i. The CSC algorithm takes advantage of the numerous overlap- 
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ping islands. In the initial stage, a colour segmentation is conducted for each island of 
level 0, resulting in segments of at least two pixels size. During the iterative process 
segments are merged or split, depending on a colour similarity criterion. The parame- 
ter to control the similarity is the Euclidean distance between two colours in an arbi- 
trary colour-space [7,9]. 




Fig. 3. The principle of the colour feature extraction of the motion region using CSC 

The real-time capable CSC approach gives superior results in the quality of seg- 
mentation and computing time in comparison to other approaches like the top-down 
Split-and-Merge or Recursive-Histogram-Splitting algorithm [10]. Thus, the CSC is 
an adequate method to further subdivide a motion region. The search for the corre- 
spondences between subdivided motion regions in subsequent images takes place in 
the next section. 

2.2.1 Tracking Analysis 

To find and track objects in a longer image sequence an abstraction level is intro- 
duced. In this level the so-called Meta-Objects (MOs) are suggested to denote real 
world objects. MOs are specified by a set of CSC-Patches. Now, the task to track MO 
splits into two parts (Fig. 4). The first to do is a correlation at the feature level, i.e. a 
correlation of MO and CSC-Patches in image pairs. The second part is a correlation at 
the abstraction level, i.e. the assignment of correlated CSC-Patches to the right MO. 
The relationship between feature and abstraction level will be explained in detail after 
the introduction of the MB and CSC-Patch correlation. 




Fig. 4. Relationship of the different comelation levels 
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2.2.1. 1 Motion-Blob Correlation 

The MB correlation (MBC) is the basis for a subsequent CSC-Patch correlation, which 
itself is the basis for the MO correlation. The MBC is founded on the assumption of 
real world objects, which are the reason for any MB can only cover a limited distance 
between two frames. The MBC is manifold, the simplest case is a 1:1 assignment 
when one MB in the image i is assigned to another one in image i+1. In general, an 
arbitrary m: n assignment is possible, where many MB are simultaneously split and 
merged. The following section will describe the algorithm for the automatic assign- 
ment of Motion-Blobs between two frames. In the initial situation there is a MB-set SO 
of frame i and another set SI of frame i-tl. Task of the algorithm is to entirely parti- 
tion S0={ b„, bj, b^ ,..., b__) and Sl={ b„’, b,’, b^’,...; b_^’}, additionally each subset UOj 
c SO shall be assigned to another subset UT c SI. The assignment is based on spatial 
vicinity and a similarity measure for the area covered by UOj and Uf. To exactly char- 
acterize the relation between UOj and Ul. a definition by cases becomes necessary. 
The different cases (Fig. 5) are the following: 

{ (^) 0 : 1 — emerge; (b) 1 : 0 - vanish; (c) 1 : 1 - simple movement; (d) 1 : n - 
split; (e) n : 1 - merge; (f) 2 : 2 - simultaneous split and merge of 2 blobs; (g) n : m - 
simultaneous split and merge ofn blobs at a time). 

Case (a) eventuates for any MB bT of SI that has not been correlated in the algo- 
rithms main loop. The knowledge of the present case of the assignment is of great help 
to afterwards realize the Meta-Object correlation. This will be the point when actual 
objects are located inside of a MB. 



1 : 2 split, case (d) 




2 : 1 merge, case (e) 2:2 split-and-merge, case (f) 





Fig. 5. Examples MB-Correlation Cases 



2.2.1.2 CSC-Patch-Correlation 

Once all MB are correlated, the second feature level can be processed. This task is 
performed for each correlated MB-Set-Pair, i.e. all related CSC-Patches will be cor- 
related themselves on the basis of different matching criteria. The matching process is 
realized through a combination of four separately weighted correlation tables that 
achieves a high accuracy at a low computational expense. The matching is done be- 
tween two CSC-Patch-Sets M„ and M, that belong to previously correlated Motion- 
Blobs. Matching criteria of CSC-Patches within the MB are (1) relative location, (2) 
Inter-frame -distance, (3) colour value and (4) size. 

Each correlation value is computed separately, assessed with respect to reliability 
and combined to a single overall similarity measure. When the similarity table is 
filled, those CSC-Patches of M„ and Mj that have the highest correlation value V, 
which is at least as high as a minimal similarity threshold value , becomes mutu- 
ally linked. All four similarity measures are weighted and combined to a single corre- 
lation table, which is used to a set of finally assigned CSC-Patches of the successive 
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frames. In evaluation of the table, each CSC-Patch pOj , will be linked to exactly one 
CSC-Patch pi. if there is a correlation value V that is maximal for the respective pair. 
Additionally V needs to be as least as high as a minimal threshold V^. However, 
there may occur cases of equivocation when the table is not of quadratic format and 
several patches pO; become linked to one patch plj. To resolve that problem patches 
are redistributed to their counterparts with the next best correlation value. 

2.2.1.3 Correlation in the Abstraction Level Assignment of Meta-objects 

After accomplishing the correlation in the feature level, i.e. the correlation in the last 
pair of blob frames, which resulted in the inter-frame assignment of MB and CSC- 
Patches, now the assignment of Meta-Objects can be conducted. In contrast to the 
prior correlation, which is always regarded as only two different frames, this step 
considers a longer history that is related to the Meta-Objects. So-called Meta-Object- 
Wrapper (MOW) are used to “wrap” the MO specific information of a certain frame 
pair. The Fig. 4 presents a general overview about the connection of the different 
correlation levels. As shown in Fig. 4, the abstraction level is connected to the feature 
level via MOW, which has access to Motion-Blobs as well as to CSC-Patches that are 
specific for a particular MO. Meta-Objects are “kept alive” from frame to frame with 
the help of their MOW. To initialise this process, we must specify how a MO can 
initially be generated. By definition, in the initial stage of the tracking system each 
MB shall be one MO. This is a very unlikely assumption, but in the course of time it is 
approximated to best fit the actual case. 



3 Results 

We want to briefly demonstrate the analysis of a real image sequence, which is over- 
laid by image-specific disturbances {brightness change, shadow and small partial 
occlusion). There, the results of segmentation of moving objects is represented by 
means of the suggested technique as motion trajectory (Fig. 6). It is to be recognized 
that during the evaluation, the segmentation of moving objects and the determination 
of motion trajectory is reliable. A robust segmentation of the moving MB objects was 
reached by means of applying the suggested technique. 

In Fig. 6 the resulting movement vectors in form of motion trajectory (Fig. 6) and 
a tracking contour (MB) are represented. When using the conventional procedure the 
contour no longer describes the actual object position due to the shadow of that object. 
Whereas the suggested technique pursues the possible regarded image region over the 
entire image sequence, because the object regions are stable due to the saturated col- 
ours of moving regions. 



4 Summary and Conclusion 

A robust algorithm has been developed for automatic segmentation of moving objects 
and robust tracking under the influence of disturbed image situations. The robustness 
is reached by the use of a MDI approach for extracting moving objects. The solution 
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Fig. 6. The analysis for moving objects in real video sequence with the suggested technique. 
Part A presents the analysis under the influence of brightness change, shadow and small partial 
occlusion. Part B shows the results of the motion segmentation (MB) in the first feature level 
for a sequence. In C, the analyses in the second level are presented. 



of the correspondence problem in the tracking has taken place in the next stage of the 
algorithm via hierarchy feature correlation from moving image regions between con- 
secutive images. The matching process is realized through the combination of four 
separately weighted correlation tables that achieve a high accuracy at lower computa- 
tional expenses. Each correlation value is computed separately, assessed with respect 
to the reliability and combined to a single overall similarity measure. By the suggested 
technique, reliable results are achieved despite overlaid by image-specific disturbances 
{brightness change, shadow and small partial occlusion). Further reached improve- 
ment is the invariant with object enlargement, object miniaturisation or object rota- 
tion. 
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Abstract. In this paper, we propose an adaptive technique for the automatic ex- 
traction and tracking of moving objects in video sequences that works robustly 
under the influence of image-specific disturbances (e.g. brightness variations, 
shadow and partial occlusion). For this technique, we apply the colour informa- 
tion, a neural recognition system and a recursive filtering algorithm to the im- 
provement of the matching quality when disturbances occur. This suggested in- 
tensity-based technique is adaptive and robust compared to the conventional in- 
tensity-based methods. 



1 Introduction 

The extraction of moving objects and subsequent recognition of their trajectories 
(tracking) in video sequences is of increasing importance for many applications. Ex- 
amples are video surveillance, motion estimation and human computer interaction. 
Generally, the motion or tracking methods can be divided into four groups: A) Three- 
dimensional-based methods [1]. B) Feature-based methods [2,3]. C) Deformable 
model-based methods [4] and D) Intensity-based methods [5]. Usually these conven- 
tional intensity-based methods of the motion analysis {e.g. Blockmatching or optical 
flow) don’t operate reliably by the influence of image-specific disturbances such as 
brightness variations, shadow, small grey tone gradients and partial occlusion. 

Our technique is an intensity-based method which can take advantage of character- 
istics found in colour scenes. The developed technique here pursues the objective of 
automatic segmentation of each moving object and furthermore the determination of 
the motion trajectory of these moving objects. For initial object selection, the motion 
vector field (full search Blockmatching BM) is used. Thereby moving blocks can be 
extracted. For the following tracking analysis the blocks with similarity motion pa- 
rameters {if they fulfill a homogeneity criterion) will be combined to object candidates, 
which are transformed into adaptive colour space. This adaptive colour space is gener- 
ated as a function of image content. By this adaptive colour space, weTl achieve an 
optimal channel reduction, separation the luminance and chrominance information on 
the one hand and on a high dynamic gain the other. The calculation of the colour com- 
ponents is obtained via the Karhunen Loeve Transformation (KLT) [6,7]. This method 
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provides the optimal subspace which minimizes the mean-square-error between the 
given set of vectors and their projections on the subspace. The resulting image regions 
present the object candidates that are used for the solution of the correspondence 
problem in video sequences. 

Next to the successful tracking of the extracted moving regions in the video se- 
quences a multi matching method (M ) is applied. This differs from the simple 
Blockmatching (BM) by the fact that in our approach object-adapted regions {ex- 
tracted from the initial step) will be used for further tracking in the sequences instead 
of fixed blocks. Thus problems of fixed block (e.g. aperture problem) are eliminated. 
This is because the object-adapted regions contain the energetic features (e.g. edge, 
corner etc.) which are used for the improvement of fail safe characteristics by the 
motion analysis. Another advantage of our approach is the extension of the sample 
matching on M {due to colour components) to obtain reliable results. This is because 
the colour components of the adaptive colour space are invariant by the modification 
of the intensity value due to overlay of shadow or brightness fluctuations. For further 
improvement regarding the partial occlusion the prehistory is evaluated by means of a 
recursive filtering algorithm [8,9]. In general, this paper consists of two main parts: 
the first section describes the adaptive technique. The second section presents some 
experimental results of this technique. 



2 Adaptive Technique for Motion Analysis 

The suggested technique is described by two processing levels (Fig. 1), whereby the 
first level deals with the segmentation of the moving objects and analysis of the colour 
information. The second level of this system is specified by the M , a neural recogni- 
tion system and a recursive modified algorithm for the estimation of the displacement 
vectors under the influence of image- specific disturbance {partial occlusion). 
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Fig. 1. The simplified suggested adaptive technique for the automatic detection and tracking 
analysis of objects in colour video sequences 
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2.1 Segmentation of Objects Using Motion Field 

In the motion analysis, it is desirable to apply automatic techniques for the selection of 
moving objects from image sequences. In this work we use a motion vector field (full 
search BM) for automated initial object selection in the RGB image sequence. One 
receives a displacement vector, which describes the determined motion of the repre- 
sented image region as a result of the BM for each reference block. The calculation of 
the displacement vectors takes place via a Q-criterion (e.g. mean absolute difference 
MAD). Due to the small implementation expenditure the criterion of the MAD was 
used. More specifically, denoting I(s,k) the intensity values of the reference image at 
pixel s and time k, and R the search region, the displacements vector v = (v^,v^) is 
obtained by minimizing MAD over the search region. M and N (Eq. 1) are the dimen- 
sions of the reference block. 

MAD(v)= (M ■ N)^' I l(s,k)- l(s- v,k- Ak)| (1) 

sgR 

The resulting vector field of the individual blocks of the overall view can be ana- 
lysed as blocks, in which blocks with identical (length, direction and neighbourhood) 
displacement vectors are combined into an object. These blocks establish the object 
candidates (cluster). The blocks outside of this cluster are analysed as outliers or other 
objects because the displacement vectors don’t belong to this cluster (different direc- 
tion). Increasing the accuracy of the exact object delimination can only be achieved by 
extending this segmenting procedure hierarchy. For the following tracking analysis, 
the selected object candidates are transformed into an adaptive colour space, so that a 
good result is achieved under the influence of disturbances (e.g. brightness variations, 
shadow and small grey tone gradients) in the sequence. 

2.2 The Adaptive KjK^Kj- Colour Space 

Good properties for a colour space with respect to tracking are among others illumi- 
nation invariance and separability. The colour of the object then remains constant and 
distinct which makes tracking and detection easier and more reliable. A further de- 
mand would be the increase of the dynamic gain. To obtain optimal channel reduction 
and a high dynamic gain, we use an adaptive colour space after the segmentation of 
the moving region for the fulfilment of this demand. In the following linear transfor- 
mation, the components K in the adaptive KjK^Kj- colour space for a pixel in the RGB 
colour space are obtained as; 

[KiK2K3r=[ey] [R G Bf (2) 

For the generation of the transformation matrix Q.^ we use the KLT [ 6 ], in which the 
components Kj are aligned in the direction of the largest variances to obtain the largest 
possible contrast. The first component Kj contains the brightness information. A larger 
dynamic gain is obtained by this component, compared with the conventional average 
value-based brightness H [7]. The other components and K 3 represent the chromi- 
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nance information, which one uses for the suppression of the influence hy shadow and 
brightness variations. This is obtained, since this channel occurs as difference relation. 
This behaviour corresponds to the visual colour perception in such a way that by the 
variation of the saturation or the brightness of a region the hue remains approx, con- 
stant. For the description of the variance of each component (part variance), a quality 
measure (Eq. 3) is defined as control criterion for the data in the respective channel. 
While the quality measure indicates the preservation of the part variance in the 
respective channel, the Ej describes the variance removed by the component Kj. The 
evaluation of the real image scenes shows that the value of the quality measure of E^ is 
approx, above 98%. This is because grey tones and little saturated colours predomi- 
nantly occur in the real scenes. Therefore the component K 3 will be removed by the 
tracking analysis of objects without significant information loss. 



( n=3 



Ei — k; 






and E 3 = 






i=l 



k'l "t" A.2 + A.3 



(3) 



2.3 The Motion Analysis 

Next to the successful tracking of the extracted moving regions in the video sequences 
a multi matching method (M ) will be used for the determination of the displacement 
vectors. In this concept, the improvement of the tracking quality during the correspon- 
dence determination in image sequences is clearly reached by the fact that on the one 
handside object-adapted image regions are utilized, and on the other colour informa- 
tion is evaluated for the improvement of fail safe characteristics. In order to exploit the 
characteristic of the individual channels optimally (table 1 ), it is necessary to summa- 
rise the channel-specific Q-criterion (e.g. MAD) for a combined total criterion 
(MADj^jjq, Eq. 4). 



Table 1. The characteristics of MAD-criterion under the influence of disturbances 





Small grey 
tone gradients 


Shadow and brightness 
fluctuation 


Partial 

occlusion 


Another 

disturbance 


MAD,,, 


reliable 


unreliable 


unreliable 


reliable 


MAD„, 


reliable 


reliable 


unreliable 


unreliable 




reliable 


reliable 


unreliable 


reliable 



Erom this the respective displacement vector is calculated. Eor this purpose a prior- 
ity of the MAD criterion is suggested according to its reliability. 

(yj = {w[ +w‘-^ ■ { ■ MADjj^ (V)* +W 2 ■ MADjj^ (V)* } 

K[,K 2 : Channels of the adaptive colour space. (4) 

, fO.4 < W; < 1 reliable 

W; : weight factor at time t. 0 < W; < 1; Whereby w,- = ■{ 

[ else 0 unreliable 

Inappropriate weighting factors will be determined in this equation by means of an 
artificial neural recognition system. This motivates the next part of the paper. 
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2.3.1 Artificial Neural Recognition System 

The fundamental idea of generating the total criterion (Eq. 4) is based on the fact that 
the neural network supplies an output activation that weights the Q- criterion. Using 
these weighting factors w, the two channel-specific Q-criteria are summed up. As a 
result, the influence of shadow or lighting changes could be reduced considerably by 
using the total criterion (Eq. 4). This means good results for the determination of dis- 
placement vectors can be obtained. Eor the determination of the weights factors 
(Eq. 4), a three-layer feed-forward network topology of a Multi-Layer Perception 
(MLP) is applied. 

The feature vector (learning data) for the MLP is gained by an individual MAD 
criterion (absolute features) and/or from two consecutive MAD criteria (difference 
features). Such difference features are e.g. the change of the minimum value, the 
change of the MAD margin values, the change of the average value and of the sur- 
rounding region values of the MAD function. Thus, a better description of the image 
interference is obtained by combination the difference and absolute features. With the 
weighting factors (w,) the reliability of the MAD criteria is not only evaluated under 
the influence of shadow or lighting changes, a partial occlusion can also be detected. 
This can take place, if the weighting factors w, and of the two components under- 
run a threshold (approx, zero) at the same time. Then a partial occlusion occurs as 
disturbance in the sequence. In this case the computed motion vector is unreliable. To 
solve this problem, a recursive algorithm is used. This will be clarified in the follow- 
ing section. 

2.3.2 A Modification of the Kalman Filter 

Own experiments on real-world greyscales images have shown that measuring values 
(motion vectors) calculated by this procedure are corrupted by noise. Especially, if 
there are applications with high requirements regarding the quality of motion vectors, 
it is necessary to minimize the influence of noise. A recursive filtering algorithm 
(Kalman filter (KE) [7,8]) is used for this task. The position of the minimum of the Q- 
criterion (motion vector) is the input of the filter. 

Besides the capability of the filter to reduce the influence of noise, there are some 
other important advantages. Eor instance the inclusion of the internal system model 
makes it possible to predict the motion vector at the next time step. This can be used 
effectively to reduce the search area in the matching algorithm. However, in case of 
problematic image situations (e.g. partial occlusion of the tracked image region), 
outliers in the calculated motion vectors cause false estimates of the KE. 

In order to cope with this problem, a modification of the conventional KE has been 
proposed by [10]. This modified KE is used in the proposed technique. It estimates the 
quality of the respective motion vector based on the weighting factors w, and w^ 
(Eig. 1). That means that the incoming (false) motion vectors are weighted less than 
before and the KE uses its internal model for the estimation increasingly. As a result, 
the influence of the partial occlusions of the tracked image region on the KE estimates 
is reduced considerably. 
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3 Results of the Analysis 

In the following sequence we demonstrate the suitability and capability of the pro- 
posed technique by the motion analysis. The object of interest is overlaid by shadow in 
image ^^5 and lighting modifications in image^^j„ in this sequence. These influences lead 
to clear modifications of the intensity values {e.g. darkening Fig. 2 A). In the first step 
the object is selected using the motion vector field. The resulting vector field is evalu- 
ated by the combination of blocks with identical displacement vectors to one object 
candidate. These blocks establish the object candidates, which are transformed after- 
wards into the adaptive colour space for the improvement of fail-safe characteristics 
(suppression of the influence of shadow and lighting modifications). For the object- 
adapted image region according to Fig. 2 (image^^^), the calculated components are 
forwarded to the second processing level of the technique. There, the motion analysis 
of the image region takes place via M . This also guarantees the determination of reli- 
able motion vectors in extreme cases, if the real image objects are overlaid by distur- 
bances. Generally, the M operates like the BM and shows good results, since the 
block dimension is adapted to the object boundaries and two channels of the adaptive 
colour space for the computation of the Q-criterion (MAD) are used. Here the MAD 
function will be calculated according to Eq. 4 for all discrete displacement vectors v, 
and in the channels (K, and K^). For this overlay of shadow in imagCj.^,, deforma- 
tions and several minima occur in the Q-criterion. This occurs in the light-intensity- 
dependent component (Kj) (Fig. 2 C) as well as in the total Q-criterion, if the matching 
is accomplished by using the image contents in the RGB colour space or in Greyscale 
image sequences. These incorrect results in the determination of the displacement 
vectors lead to the fact that the treated image region leaves the originally pursued 
region (Fig. 2 E) and the motion trajetory doesn’t describe the actual position of the 
object. In contrast to it the Q-criterion of the component allow an error free motion 
estimation due to good minimum development (Fig. 2 C). For the automatic weighting 
of the channel-specific MAD (Eq. 4), a neural recognition system is used to suspend 
the faulty MAD function. 

The training data for the neural network are formed from a quantity of feature vec- 
tors from the MAD function, which are in this case generated from the test sequence 
which contains sufficiently significant image interferences. Here the unreliable chan- 
nel-specific MAD^j of the component (Kj) is suppressed. In opposite to MADj^^j the 
reliable similarity criterion (MAD^) contributes to a high weight in the total criterion 
(Eig. 2 C). This can be demonstrated clearly via the MLP. When analysing the calcu- 
lated MADj;, functions, the MLP^j supplies an output activation of zero (Wj=0) at the 
time ^^5 (Pig. 2 D). The other MLP^ of systems evaluates the MAD^ of the K, compo- 
nent as reliable at the same timej,^, (w 2 = 0.9). The total criterion allows an error free 
motion estimation (motion trajectory Pig. 2 E) due to a good minimum. The total 
criterion ensures improved results compared to the conventional procedures in the 
motion analysis (Fig. 2 E). If the moving object is not occluded in one frame and its 
match is partially occluded in the following frame, then the displacement vectors es- 
timated for this object region may have some error caused by the changed shape of the 
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region due to the partial occlusion. In this case the weighting factors vr, and of the 
two components under-run a threshold {approx, zero) at the same time. 
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Fig. 2. The analysis of the image sequence under the influence of the shadow effect in the im- 
age ,.,5 and lighting modifications in image,^,„. The calculated MAD functions for the object of 
interest is shown in C. The weight w, shows that the MAD,,, criterion starting at the time,^, can’t 
be evaluated anymore (D). The MLPl supplies an output activation w, of zero. In opposite to it 
MLP2 evaluates MAD,,^ as reliable. In E) The results of the tracking analysis for an image 
region in a video sequence with conventional methods and with the suggested system structure 
(b). The right image shows the analysis for another sequence with the influence of disturbances 
(shadow). 
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For the solution of this problem those displacement vectors are used as input data 
for a recursive process of estimation, which additionally include the temporal ’prehis- 
tory’ of the motion in the analysis. This procedure generates a back-up trajectory, 
which is valid within a limited area in case of failure of the measuring information by 
occlusion. For the solution of this problem those displacement vectors are used as 
input data for a recursive process of estimation, which additionally include the tempo- 
ral ’prehistory’ of the motion in the analysis. This procedure generates a back-up tra- 
jectory, which is valid within a limited area in case of failure of the measuring infor- 
mation by occlusion. 



4 Summary and Conclusion 

In this paper, a technique was suggested for the motion estimation of objects in video 
sequences. For an automatic object selection, a motion vector field was used. Because 
the BM for the displacement calculation under the influence of disturbance situations 
is quite sensitive, the use of an adaptive colour space during the tracking of the objects 
was suggested. For a successful tracking of the extracted moving regions in the video 
sequences, a M was used. The channel-specific criteria are combined to a total crite- 
rion according to their reliability, which show more exact and durable results com- 
pared to conventional procedures, in particular to problematic measuring situations by 
adaptive priority of the proportions. In the suggested technique, a modified recursive 
filtering algorithm was applied for the reduction of the influence of partial occlusions 
on the tracked image region. By the suggested technique, reliable results are achieved 
despite the influence of disturbance situations. 
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Abstract. An effective method for removal impulse noise in corrupted color 
images is proposed. The method consists of two steps. Outliers are first 
detected using spatial relationships between the color image components. Then 
the detected noise pixels are replaced with the output of the vector median filter 
over a local spatially connected area excluding the outliers, while noise-free 
pixels are left unaltered. Simulation results in a test color image show a 
superior performance of the proposed filtering algorithm comparing with the 
conventional vector median filter. The comparisons are made using a mean 
square error, a mean absolute error, and a subjective human visual error 
criteria. 



1 Introduction 

Color images are often corrupted by impulse noise due to a noise sensor or channel 
transmission errors. The major objective of impulse noise removal is to suppress the 
noise while preserving the image details. Color images can be considered as two- 
dimensional three channel signals. So, monochrome image processing techniques 
such as median and, in general, order statistics filters [1-4] demonstrating good ability 
in the removal of impulse noise can be applied to each color component plane. 
However, such component-wise noise removal does not give desirable results because 
the output values may be with possible chromaticity shifts. Therefore, it is desirable to 
employ the dependence between the color components. Recently an effective 
nonlinear vector filter called as vector median filter (VMF) [5] was proposed. The 
VMF and its variants [6] represent ones of the most popular approaches for noise 
removal in color images. However, because these approaches are typically 
implemented uniformly across a color image, they also tend to modify pixels that are 
undisturbed by noise. Moreover, they are prone to edge jitter when the percentage of 



A. Sanfeliu and J. Ruiz-Shulcloper (Eds.): CIARP 2003, LNCS 2905, pp. 113-120, 2003. 
© Springer-Verlag Berlin Heidelberg 2003 



114 



V. Kober, M. Mozerov, and J. Alvarez-Borrego 



impulse noise is large. Consequently, the effective removal of impulses is often at the 
expense of blurred and distorted features. Recently nonlinear filters for monochrome 
images with a signal-dependent shape of the moving window have been proposed [7]. 
In this paper, we utilize the approach for suppressing the impulse noise in highly 
corrupted color images. First outliers are detected using spatial relationships between 
the color components. Then the detected noise pixels are replaced with the output of 
the VMF computed over a local spatially connected area excluding the outliers from 
the area. In the case of independent channel impulse noise, the proposed detector 
greatly reduces the miss probability of impulse noise. The performance of the 
proposed filter is compared with that of the conventional VMF algorithm. 

The presentation is organized as follows. In Section 2, we present a novel efficient 
algorithm for detection of noise impulses. A modified vector median filter using the 
proposed detector is also described. In Section 3, with the help of computer 
simulation we test the performance of the conventional and proposed filter. Section 4 
summarizes our conclusions. 



2 Spatially Adaptive Algorithm for Detection and Removal 
Impulse Noise 

In impulse noise models, corrupted pixels are often replaced with values near to the 
maximum and minimum of the dynamic range of a signal. In our experiments, we 
consider a similar model in which a noisy pixel can take a random value either from 
sub-ranges of the maximum or the minimum values with a given probability. The 
distribution of impulse noise in the sub-ranges can be arbitrary. To detect impulse 
noise in a color image, we use the concept of a spatially connected neighborhood. An 
underlying assumption is as follows: image pixels geometrically close to each other 
belong to the same structure or detail. The spatially connected neighborhood is 
defined as a subset of pixels {v„„,} of a moving window, which are spatially 
connected with the central pixel of the window, and whose values deviate from the 
value of the central pixel v*, / at most predetermined quantities and H-£,, [7]: 

CEV(y,,i)=CON (|v„^ -e^ < v„,„ <v^j +£,))• (1) 

The size and shape of a spatially connected neighborhood are dependent on 
characteristics of image data and on parameters, which define measures of 
homogeneity of pixel sets. So the spatially connected neighborhood is a spatially 
connected region constructed for each pixel, and it consists of all the spatially 
connected pixels, which satisfy a property of similarity with the central pixel. 

The vector median filter replaces the color vector of each pixel by the vector median 
value. However, the VMF is often implemented uniformly across a color image. This 
leads to undesired smoothing of image details, which are uncorrupted by impulse 
noise. Therefore, the quality of the filtering depends on an impulse noise detector. 
The detector must decrease the probabilities of impulse noise miss and false detection. 
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In other words, it should detect as much as possible noisy pixels, while the false 
detection should he as less as possible to preserve image details. We propose to detect 
outliers with the help of spatial relations between the color components. We assume 
that a spatially connected region corrupted with impulse noise is relatively small 
comparing to details of the image. Therefore, the impulsive noise can be detected by 
checking the size of its region. If the size is less than a given threshold value, say M, 
impulse noise is detected. Obviously, such a detector omits impulses with the size 
greater than M. The probability of occurrence of a four-connected noise cluster of the 
size M in a moving window can be computed using the addition formula of 
probabilities. The noise cluster occurs simultaneously with one of the mutually 
exclusive events Here //*, is the event denoting that there is a noise cluster 

of the size exactly M noise impulses surrounded by uncorrupted image pixels. The 
probability of occurrence of a noise cluster of the size M at a given image pixel is 
given as 

N 

Pr(M) = ^Pr(//J, (2) 

k^l 

where the probability of the event is Pr(//^. ) = (l- , Ei,{M) is the 

number of surrounded uncorrupted image pixels. Taking into account that some of the 
probabilities Pr(Hi^) are equal, the Eq.(2) is computationally simplified to 

K(M) 

Pr(M) = P" (3) 

k=\ 



where K(M), Ck{M), Ep(M) are coefficients determined from the geometry (binary 
region of support) of the cluster of noise. 



Table 1. Coefficients for calculating the probability of impulsive clusters 



Size of cluster M 


K(M) 


k 


Ck{M) 


Ek{M) 


1 


1 


1 


1 


4 


2 


1 


1 


4 


6 


3 


2 


1 


12 


7 






2 


6 


8 


4 


3 


1 


36 


8 






2 


32 


9 






3 


8 


10 


5 


5 


1 


5 


8 






2 


100 


9 






3 


140 


10 






4 


60 


11 






5 


10 


12 



For a given image pixel, K{M) is the number of groups, each of them contains C^iM) 
events EIi^ with the equal probabilities Pr(Hi^), k=l, ...K{M). For example, the number 
of groups with M=2 is K(2)=l, and the number of surrounding four-connected 
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uncorrupted pixels is E]{M)=6. The number of the events is Cy(M)=4 (four possible 
variants of the noise cluster on the grid including the given pixel). With the help of 
Table 1 and Eq. (3), the probability of occurrence of a four-connected impulse noise 
cluster of the size M can be easily calculated. Table 2 presents the probability of 
occurrence of impulse cluster of size M versus the probability of impulse noise on a 
rectangular grid. We see that when the probability of impulse noise is high, the 
occurrence of impulse cluster is very likely. 

Table 2. The probability of occurrence of impulse clusters of the size M versus the probability 
P of impulse noise. 



M 


Probability of impulse noise 


P=0.01 


P=0.1 


P=0.2 


0 


0.99 


0.9 


0.8 


1 


5.6x10'^ 


6.5x10'" 


8.2x10'" 


2 


3.7x10"* 


2.1x10'" 


4.2x10'" 


3 


1.7x10'^ 


8.3x10'" 


2.8x10'" 


4 


7x10'’ 


3x10'" 


1.8x10'" 


5 


2.8x10'® 


1.1x10'" 


1.1x10'" 



Here we provided the coefficients for M<5. In a similar manner, the coefficients for 
greater sizes of noise clusters can be calculated. 

Suppose that impulsive noise is independent in L signal channels. The probability of 
occurrence of a noise cluster of the size M ala. given image pixel can be written as 

Pr(M) = (p^f ^ , (4) 

k=\ 



For a color image (T=3), the probability of impulse noise with M=\ and P=0.1 
becomes 0.000996 (compare to 0.065 for L=l). We see that the probability of 
multichannel impulse noise greatly decreases when the number of channels increases. 
The algorithm of impulse noise detection in a color image is given as follows. First, 
we construct spatially connected neighborhoods in the RGB channels, independently. 
The parameters spatially connected neighborhoods in the channels are chosen on the 
base either a priori or measured information about the spread of the signal to be 
preserved. 

Let ICON and UCON be two sets obtained as intersection and union of the regions of 
supports of the spatially connected channel neighborhoods CONr CONq CONb, 
respectively. If the number of elements in ICON is small, then at least in one channel 
there exists impulse noise. If the size of UCON is large, then a detected impulse is 
probably in one channel. If the both sets are small, impulse noise is in three channels. 
However, the probability of this event is very small. Finally, for the moving window 
of 3x3 pixels we use the following threshold values for the sets: if the size of ICON < 
1 and UCON < 1 , the central pixel is corrupted in three channels; if the size of ICON 
< 2 and the difference between two sizes > 3, then the central pixel is corrupted in one 
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or two channels; if the size of ICON> 2 and the difference< 2, the central pixel is not 
corrupted and there is a high local signal variation in the channels. 

Finally, the detected impulse noise is replaced with the output of the VMF computed 
over a local spatially connected area excluding the outliers. The conventional VMF is 
defined as follows. For a set of N vectors in the RGB color space 
5€ (X|,x 2 ,...x^), x„ =(R„,G„,B„) with a vector norm ||x||^, the vector median filter is 

given by 



with 



- ’ ^VM > ^VM )’ 



S , 





Vx^eS. 



( 5 ) 



This operation selects such a vector in the moving window, which minimizes the sum 
of the distances to the other N-1 vectors regarding the L-norm. We suggest to find the 
median value among the vectors belonging only to the set of spatially connected 
neighborhoods with the region of support UCON excluding corrupted pixels. 
However, if the size of UCON is small, for noise filtering a small region surrounding 
UCON is used. The proposed algorithm is extension of the algorithm [7] to color 
images, and it can be written as 



v„,„, if SIZE(lCON)>Th_ICON 

■ VMu(y„ ^CON}- v„ {/con}) if SIZE(uCON)> Th _ UCON , (6) 

vmf{v„„, \jcon\ - v„,,„ {c/coa}) otherwise 



where Th_ICON and Th_UCON are threshold values of outlier detection for the sets 
ICON and UCON, respectively; denotes the set difference operationvvvv; v„,„{S} 

is the subset of pixels of the moving window with the region of support S; UCON is a 
small region surrounding UCON. The algorithm starts from the first line of Eq. (6). 



3 Computer Experiments 

Signal processing of a color image degraded due to impulse noise is of interest in a 
variety of tasks. Computer experiments are carried out to illustrate and compare the 
performance of conventional and proposed algorithms. We are interested in answering 
how well, relative to the other filters, does each perform in terms of noise removal 
and preservation of fine structures. However, it is difficult to define an error criterion 
to accurately quantify image distortion. In this paper, we will base our comparisons 
on the mean square error (MSE), the mean absolute error (MAE), and a subjective 
visual criterion. The empirical normalized mean square error is given by 
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N M 

I ''1^ 

MSE = , (7) 

n=\ m=\ k=l 

where and are the original image and its estimate (filtered image), 

respectively. In our simulations, N=320, M=200 (320x200 image resolution), and 
each pixel has 256 levels of quantization. The empirical normalized mean absolute 
error is defined as: 




MAE = 



N M i 

^n,m,k ^n,m,k 

n=l m=l k=l 

N M 3 

XXXK«a| 

n=l m=l k=l 




( 8 ) 



The use of these error measures allows us to compare the performance of each filter. 
Fig. 1(a) shows a test color image degraded due to impulse noise. The probability of 
independent noise impulse occurrence is 0.07 in each color channel. This means that 
the total noise probability is PflGB=l-(l-P)^=l-0.93^~0.2. In computer simulation, the 
values of impulses were set to 0-15 or 240-255 with equal probability. Table 3 shows 
the errors under the MSE and MAE criteria for the VME and the proposed filter. The 
size of the moving window is 3x3. The value f;, to construct spatially connected 
channel neighborhoods is equal to 10. The threshold values are taken as Th_ICON=l 
and Th_UCON=4. The first two rows in the table show the errors after filtering of the 
original image without noise. We see that in this case the conventional VME has a 
worse performance comparing with the proposed algorithm. Fig 1 (b, c) show the 
filtered images obtained from the noise image in Fig. 1 (a) with the conventional 
VMF and the proposed filter, respectively. The proposed filter using the spatial pixel 
connectivity has a strong ability in impulse noise suppression and a very good 
preservation of fine structures and details. The visual comparison of the filtered 
images in Fig. 1 (b) and 1 (c) shows that the filtered image with the VMF is much 
smoother than the output image after filtering with proposed method. 



Table 3. Impulse noise suppression with different filters 



Type of Filters 


Measured Errors 


MSE 


MAE 


VMF 3x3 (WN) 


0.0173 


0.0802 


Proposed algorithm 
(WN) 


0.0014 


0.0056 


Noisy image 


0.0897 


0.1338 


VMF 3x3 


0.0197 


0.0887 


Proposed algorithm 


0.0087 


0.0319 
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Fig. 1. (a) Noisy color image 




Fig. 1. (b) Filtered image by VMF 




Fig. 1. (c) Filtered image by the proposed algorithm 
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4 Conclusion 

In this paper, we have presented a new algorithm for detection and suppression of 
impulse noise in color images. The filter utilizes an explicit use of spatial relations 
between color image elements. When the input color image is degraded due impulse 
noise, extensive testing has shown that the proposed spatially adaptive vector median 
filter outperforms the conventional vector median filter in terms of the mean square 
error, the mean absolute error, and a subjective visual criterion. 
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Abstract. In this paper we present a novel method for computing phase- 
congruency by automatically selecting the range of scales over which a locally 
one-dimensional feature exists. Our method is based on the use of local energy 
computed in a multi-resolution steerable filter framework. We observe the 
behaviour of phase over scale to determine both the type of the underlying 
features and the optimal range of scales over which they exist. This additional 
information can be used to provide a more complete description of image-features 
which can be utilized in a variety of applications that require high-quality 
low-level descriptors. We apply our algorithm to both synthetic and real images. 

Keywords: Phase-congruency, local energy feature-detection, scale-detection, 
steerable filters 



1 Introduction 

Phase congruency [7] is a very appealing concept for general feature detection because 
it permits feature detection independent of the actual feature type, i.e. rather than being 
optimized to detect edges or ridges or valleys it can be used to detect almost any type 
of feature in a unified framework. The underlying principle is that phase is constant or 
congruent over all scales at the location of what the human visual system would perceive 
as a locally one-dimensional feature, such as an edge, a ridge or a valley (as opposed 
to two-dimensional features, e.g. junctions) [6]. An advantage of phase-congruency is 
that the type of feature can be classified using the phase-value at which congruency 
occurs. Furthermore, the degree to which phases are congruent can readily be computed 
as a ratio of some ‘ideal’ value and the actual phase-values leading to a measure that is 
contrast and brightness independent. There are interesting parallels between the concept 
of phase-congruency and Lindeberg’s [5] concept of the scale-space edge, which is 
defined as a connected set of points in scale-space: features are found where a certain 
measure persists over a consecutive range of scales. The method presented in this paper 
identifies locations in image signals at which phase takes on a small set of fixed values 
over a range of subsequent scales. 

The use of phase is particularly appealing for a number of reasons: it has been 
demonstrated experimentally [8] that most of the information in a signal is stored in the 
phase, rather than the amplitude: the phase effectively encodes the ‘location’ at which 
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the individual sinusoid contribute to the overall signal. Furthermore, phase is stable not 
only under translation but also geometric deformations and contrast variations [1]. 

One of the great advantages of the use of phase is that the exact position of a feature 
can be determined easily to sub-pixel accuracy [3] without the need for explicit sub-pixel 
feature detection. 

Morrone and Owens [7] show that for a one-dimensional signal I {x) which has the 
short-term Fourier Transform expansion: 

I{x) = E An COs{nU>X + (pn) = ''^^ An C0S{(I>„{X)) ( 1 ) 

n>0 n>0 



where (f>n are respectively the components of amplitude and phase at position 
X, all phase-components 4>n{x) are (near) identical at the location a: of a feature. 

Phase congruency can be shown to be directly related to local energy [10] which in 
the one-dimensional case is dehned as 



LEid = VP + 



( 2 ) 



where I is the input signal and H its Hilbert transform. 

In practice, phase-congruency is obtained by computing local energy at a number 
of scales and integrating the resulting coefficients appropriately. Kovesi [4] presents the 
hrst such computationally efficient implementation of phase-congruency. He dehnes a 
phase congruency measure PC{x) at signal location x as: 



PC{x) = max 



En>o cosipnix) - p{x)) 



LE 












, An 



(3) 



The value of (j){x) that maximizes (3) is the amplitude weighted mean local phase- 
angle. It can be shown that the numerator LE in the above expression is the local energy 
of the signal [10]; PC is consequently the (local) maximum of the amplitude-weighted 
sum of the phases (computed over a range of scales) normalized by the amplitude sum. 
Kovesi extends the concept to two dimensional signals and addresses issues such as noise 
or advanced phase-congruency based operators for symmetry and asymmetry. Although 
PC gives good results for a wide range of synthetic and natural images and produces 
feature maps that compare favourably with e.g. a Canny edge detector [4], there are a 
number of drawbacks which are related to one major oversight: the fact that local energy 
is proportional to phase-congruency (see (3)) is only valid for isolated features and is no 
longer valid as soon as the spatial extent of the hlter used overlaps neighbouring features 
in the input signal. This is illustrated in Fig. 1 which shows in (A) a signal consisting 
of a square wave with decreasing frequency (from left to right). Subhgure (B) is the 
energy amplitude scaleogram (computed using (2) over a range of centre frequencies 
A = 0.1 : 0.65.) The corresponding phaseogram is shown in Subhg. (C). A number of 
observations are immediately obvious: 

- The range of scales over which edges have significant energy is much larger than 
the rather small range of scales over which ridges and valleys respond. 



Improving Phase-Congruency Based Feature Detection 



123 



(A): irpul signal 





Fig. 1. Locality of energy response: the energy distribution over scale depends not only on the 
underlying feature, but also its neighbourhood. See text for details. 



- There is interference as filters overlap neighbouring features leading to the cancel- 
lation of responses which in (B) correspond to the ‘black holes’ along the centre of 
the scaleogram (or alternatively, the regions in (C) where two columns of constant 
phase merge). In these regions phase is unstable and must not be used for further 
processing [3]. 

- A second form of interference occurs towards the low-frequencies (small A): re- 
sponses due to combinations (rather than isolated) of neighbouring features add up 
to high energy amplitudes as shown by the broad high intensity curve in the scale- 
ogram. The response is high even in between what we would regard as a feature and 
therefore must be excluded from further processing. 

Since PC is computed over all scales, rather than the appropriate range of scales and 
- as illustrated above - strong features (edges) dominate neighbourhoods, PC responds 
predominantly to edges and isolated very thin lines and produces spurious responses in 
areas with high energy which is due to interference of low-frequency responses. This is 
illustrated in Fig. 2 and Fig. 3. The left of Fig. 2 shows a synthetic image with a variety 
of structures including ridges, valleys and edges over a wide range of orientations, scales 
and with different amounts of smoothing. 

The raw PC output is shown on the right in Fig. 2. There are high responses near 
most of the edges apart from the rather smooth edges in the lower left quadrant. The 
responses at the edges are also significantly stronger than those corresponding to the 
ridges and valleys in between these edges. As a result, the post-processed version shown 
in Fig. 3 marks as interesting features only edges and is rather unsuccessful at identifying 
the smoothed features in the lower left quadrant. The method proposed in this paper 
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Fig. 2. Left: Synthetic signal with variety of locally one-dimensional features (edges, ridges, 
valleys). Right: corresponding PC map. Note the strong responses at edges, almost uniform 
response in bottom-left quadrant and weak responses at ridges/valleys. 



identifies a wider range of geometric features than just edges and provides a better 
response to heavily smoothed features, as in the lower left quadrant of the image, thus 
taking advantage of local energy’s inherent ability to respond to any type of locally 
one-dimensional feature equally well. 



2 Methodology 

Our approach is based on the use of local energy for feature detection. At each location 
in the image, we compute an energy response, which can be used to obtain an energy- 
amplitude and a phase-angle (f>. Since we are interested in the energy response and 
phase-angle at each image feature independent of its orientation, we interpolate the 
exact response at the orientation 9i of each image-point through the use of steerable 
filters [2]. We define 0; to be perpendicular to the locally one-dimensional feature, i.e. 
along the orientation of maximum variation. Only regions of the image with high local 
energy will be selected for further processing. Since the post-processing stages involve 
tracking phase (f) over multiple scales, the signal is decomposed at multiple resolutions. 

2.1 Decomposition and Orientation Computation 

Local energy is computed using quadrature pairs of odd-symmetric and even-symmetric 
bandpass filters. We decompose the image using a steerable filter bank at A’o = 4 
odd-symmetric and Ng = 5 even-symmetric orientations 9o and 0e distributed evenly 
over the half-circle of orientations 0..7T (i.e. Og = ^[0, 1, 2, 3] and 0e = ^ [0, . . . , 4]) and 
S' = 15 closely spaced scales (spacing of 0.2 octaves for filters with 1-octave bandwidth) 
in order to approximate a continuous decomposition (in scale). In the following, where 
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Fig. 3. Left: PC map after postprocessing (non-maximum suppression and hysteresis threshold- 
ing). Note the noisy response in bottom left quadrant. Furthermore, only edges are detected. Right: 
LE: Ridge (black) and valley (white) feature points overlaid onto original image. See text for more 
details 



appropriate, the scale index s is omitted to improve legibility. The polar-separable filters 
are constructed in the Fourier domain with a cosine raised to the power 2 on a log-scale as 
the radial component and a quadrature pair of filters for the angular component which has 
a cos^ and | cos^ | cross-section for the odd/even-symmetric filter-parts respectively [9]. 

The image is convolved with the resulting filterbank by multiplying the Fourier 
Transform of the image with the individual filters and inverse Fourier transforming. The 
resulting subbands are kept at full resolution, i.e. no pyramid scheme is used, which 
facilitates the tracking of phase through scale. The 2D extension of (2) is 

lf;„ = x/02 + El (4) 

where , En are the odd/even-symmetric responses for each of the No orientations. 

Since No ^ N^. and the odd-symmetric and even-symmetric filters consequently are not 
aligned, they need to be aligned prior to computing LE. This is achieved by steering the 
even-symmetric biters to the orientations 0^. 

Orientation Computation. At each point and for every scale local orientation Qi is 
found as 

9i = -aict&n{imag{V)/real{V)) (5) 

whereC = exp(z2n7r/A"o)|Li?„| is the amplitude weighted sum of oriented 

unit length vectors aligned with 9o- 

At each point and scale S, the No subband coefficients are steered to 0;. The result 
of this operation is a cuboid of S steered energy response-maps. 
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2.2 Tracking Phase over Scale 

The basic idea for determining the correct range of scales over which to compute phase- 
congruency is to track the precise, i.e. sub-pixel contours of phase over scale and to label 
a location in the image as ‘interesting’ if phase is congruent over a minimum number of 
scales. The candidate points are obtained using the Matlab function contourc on each 
response map of the cuboid using the appropriate phase-angles for positive/negative 
edges and ridges and valleys. 

In order to avoid having to exhaustively search the entire image, we use hysteresis 
thresholding of local energy amplitude to reduce the number of candidate points. 

For each of the (equally spaced along a phase-contour) candidate points p, we iterate 
through all scales A. If p has correspondences with the same phase-value in a minimum 
number of subsequent scales, then p is marked as a feature point for that particular 
phase- value. The start and end-scales At, and Ag of where phase is congruent are recorded. 
All points in the chain from Af, to Ag are removed from the set of candidate points, i.e. 
they will not be considered at subsequent stages. Correspondence is established if the 
contours at subsequent scales are no further apart (measured as the distance between 
two parallel contours) than a specified fraction dmax of pixels. The subpixel location of 
p is recorded as the mean of all points in the chain. 

Through extensive experimentation on a large set of images we found that the values 
dmax=0-3 pixels and lmin='^ scales give excellent results for a wide range of images. 
The choice of dmax is not critical, values between 0.2 < dmax < 0.5 give acceptable 
results. The choice of Imin directly affects 

1) the sensitivity to the degree of smoothing a feature has undergone: strong smoothing 
leads to responses with a small frequency spread, i.e. there are only significant responses 
over a (very) small range of scales. Values of Imin ^ 4 therefore eliminate strongly 
smoothed features and favour sharp transitions. Since edges exist over a wider range of 
scales than lines, too large a value for Imin would therefore favour edges over lines. 

2) How features in closely packed neighbourhoods are treated: as the filter-size increases, 
the filters quickly start to overlap multiple features, leading to interference in responses. 
As a result, stability in phase over scale is only guaranteed over a small range of scales. 
If Imin is too large, regions with densely packed features are eliminated. Note that 
lmin=4: is a value we obtained empirically for a decomposition over 5=15 scales with a 
corresponding step-size of 0.2 octaves. (mm=4 therefore implies that a feature needs to 
exist over just under one octave. 

The method is stable with respect to the thresholds, small changes lead to gradual 
changes in the output, rather than catastrophic failure. 



3 Results 

The methods described in the previous sections were applied to a number of synthetic and 
real images with varying noise and contrast levels and different densities of features. The 
results of LE are displayed as follows: since the location of the feature-point is computed 
at subpixel accuracy, the resulting points are displayed as short lines perpendicular to 
01 , i.e. parallel to the orientation of the locally one-dimensional feature. 
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Fig. 4. Top: Fingerprint. Bottom: Writing on floor. See text for details 



The right of Fig. 3 shows the ridges and valleys found in the synthetic signal of Fig. 2 
overlaid in black and white respectively. For clarity the edges were not marked. Note 
that we have found and marked a range of features which were completely missed by 
PC as shown on the left in Fig. 3. In particular, we have found the relevant ridges and 
valleys independently of their actual width not only in the regions with sharp transitions, 
but also in the smoothed region of the image. It is clear that broad ridges and valleys 
towards the edge of the image are not marked, this is due to the fact that the filters used 
were not large enough to respond to features of this size. As expected, the edges of these 
features are detected however. This example also illustrates one of the problems of the 
method: very fine lines or edges in neighbourhoods with closely spaced features are not 
detected reliably as shown e.g. in the centre of the image. At these locations, phase is 
only valid over a very limited range of scales, due to interference from neighbouring 
features. This is however a property of phase-congruency per se, rather than the fault of 
LE. 

In the two images at the top of Fig. 4 the ridges found in an image of a fingerprint 
are highlighted. The left image shows the output of the LE method (for clarity, the 
edges and valleys were not marked). Despite the noise-levels and variation in width, 
the exact centres of the ridges have been marked correctly. This means, not only can 
we detect the ridges, but we can also describe it in terms of width, orientation etc. The 
right image shows the output of PC. As with the other examples, the edges dominate 
most other features. Additionally, because the features are relatively closely spaced, 
there is significant interference between neighbouring features, leading to a high level 
of spurious, noisy responses. 
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The two images at the bottom of Fig. 4 shows the image of a piece of writing on a 
floor. Note how the width of the letters gets narrower towards the top of the image (rear 
of the scene). On the left we see the result of LE. Despite this considerable variation in 
width of the lines, all letters were detected correctly. Note however, that the faint lines on 
the floor have not been marked. This is due to the use of the global hysteresis thresholds 
[0.15, 0.30] of the maximum energy amplitude. Since local energy is computed at a local 
level as the name implies, a future improvement would be to use hysteresis thresholding 
based on a local rather than a global threshold. The image on the right shows the output 
of hysteresis thresholded PC. Once again, where lines are broad enough to have distinct 
edges, the edges dominate the broad lines. 

4 Discussion and Conclusions 

We have introduced a novel method LE for computing phase-congruency over the ap- 
propriate range of scales, rather than averaging over all scales. The features obtained 
using LE are a great improvement over existing phase-congruency implementations and 
a further step towards a truly general feature detector. 

The correct range of scales can be used to further describe a feature and the position 
of a feature is computed to subpixel precision. We have demonstrated the usefulness of 
the new method using both synthetic as well as real images. 
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Abstract. The precise knowledge of the statistical properties of 
synthetic aperture radar (SAR) data plays a central role in image 
processing and understanding. These properties can be used for 
discriminating types of land uses and to develop specialized filters 
for speckle noise reduction, among other applications. In this work 
we assume the distribution C/J 4 as the universal model for multilook 
amplitude SAR images under the multiplicative model. We study some 
important properties of this distribution and some classical estimators 
for its parameters, such as Maximum Likelihood (ML) estimators, but 
they can be highly influenced by small percentages of ‘outliers’, i.e., 
observations that do not fully obey the basic assumptions. Hence, it is 
important to find Robust Estimators. One of the best known classes of 
robust techniques is that of M estimators, which are an extension of the 
ML estimation method. We compare those estimation procedures by 
means of a Monte Carlo experiment. 

Keywords: Robust Estimation, SAR Images, Speckle Noise, Monte 
Carlo. 



1 Introduction 

Last decade was marked by the affirmation of SAR images as a tool for earth 
monitoring. Several studies were made confirming their relevance, where image 
processing techniques were developed especially devoted to them. Most of the 
SAR image processing techniques are based on statistical properties of the SAR 
data, those properties might be used for the development of tools for SAR image 
processing and analysis, for instance, filters to reduce speckle noise, as well as 
classification and segmentation algorithms. 

There are many statistical models for synthetic aperture radar (SAR) im- 
ages, among them, the multiplicative model is based on the assumption that the 
observed random field Z is the result of the product of two independent and 
unobserved random fields: X and Y . The random field X models the terrain 
backscatter and thus depends only on the type of area each pixel belongs to. 
The random field Y takes into account that SAR images are the result of a co- 
herent imaging system that produces the well known phenomenon called speckle 
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Fig. 1. Meaning of the ot parameter of the Qa distribution in SAR images. 



noise and are generated by performing an average of L independent image looks 
in order to reduce the speckle effect. This is assuming that X and Y are both 
weak stationary stochastic processes. The last fact is based on the assumption 
that the speckle noise corresponding to cells of different resolution is generated 
by the interaction of many independent dispersion points. Speckle refers to a 
noise-like characteristic produced by coherent systems, including sonar, laser, 
ultrasound and synthetic aperture radars. It is evident as a random structure of 
picture elements caused by the interference of electromagnetic waves scattered 
from surfaces or objects. 

There are various ways of modelling the random fields X and Y. Classically, 
both the speckle noise Y and the backscatter X have been modelled with a 
distribution [TCG82]. This parametrization makes the return Z obey the 
Ka distribution. The Ka distribution fails to model many situations where the 
return is extremely heterogeneous, besides being computationally cumbersome. 

On the other hand, in [FMYS97] was proposed the distribution to 

model the amplitude backscatter X. This new model, when used along with the 
classical one for the speckle noise yields a new distribution for the return, called 
Qa ■ The advantage of the Qa distribution over the classical Ka distribution 
is that it models very well extremely heterogeneous areas like cities, as well as 
moderately heterogeneous areas like forests and homogeneous areas like crops. 

The g°A distribution is characterized by as many parameters as the Ka distri- 
bution: the number of looks (L), a scale parameter ( 7 ) and a roughness parame- 
ter (a). Besides the advantages, this distribution proposal has the same nice 
interpretational properties than the Ka distribution has, see [FMYS97]. The 
parameter 7 is a scale parameter and is related to the relative power between 
reflected and incident signals. The parameter a is of particular interest in many 
applications, since it is directly related to the roughness of the target. The figure 
1 shows how the a parameter can be used to make inferences about the type of 
land seen from a particular SAR image. 

The figure 2 is representative of the typical complexity of real SAR images, 
where we can distinguish several types of roughnesses or textures. This work 
discusses the problem of estimating the parameters of the distribution for 
the case of single looks that arises in image processing and analysis with large 
and small samples. Two typical estimation situations arise in image processing 
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Fig. 2. SAR Image of a Chilean copper mine. 



and analysis, namely large and small samples, being the latter considered in this 
work. Statistical inference with small samples is subjected to many problems, 
mainly bias, large variance and sensitivity to deviations from the hypothesized 
model. The last issue is also a problem when dealing with large samples. 

Robustness is a desirable property for estimators, since it allows their use even 
in situations where the quality of the input data is below of the level accepted 
by standards [HRRS86]. Most image processing and analysis procedures, like 
classification, restoration, segmentation, use field data. A situation where this 
occurs is when ground controls points (GCP) appear in the SAR image, which 
are essential for data calibration. These points produce a return higher than the 
rest of the image, for this reason they are called corner reflectors. If the data 
from a corner reflector it is included in the SAR image, the estimation procedure 
is non-robust, and the results may be completely unreliable. 

In Section 2 a brief explanation of the distribution is presented together 
with the classical maximum likelihood estimators of its parameters. Section 3 
presents the robust M-estimators, which are capable to deal with non perfect 
data. In section 4 estimation procedures are compared by means of a Monte 
Carlo study. 

2 The Distribution 

The general (multilook) form of the density, which characterizes the G^A{a,^,L) 
distribution is given in [FMYS97] as 

2L^r{L-a) z 2 L-i 



z > 0, 



( 1 ) 
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where a < 0 is referred to as the roughness parameter, 7 > 0 is a scale parameter 
and L > 1 is the number of looks. The number of looks is controlled in the 
early generation steps of the image, and is known beforehand or it is estimated 
using extended homogeneous targets. This parameter remains constant over all 
the image. This law was originally devised to describe extremely heterogeneous 
clutter, and lately proposed and assessed as an universal model for speckled 
imagery in [MFJBOl]. Improved estimation using bootstrap for the parameters 
a and 7 of this distribution is presented in [CFS02], while the robustness for the 
L = 1 case is studied in [BLF02] using M-estimators. 

The single look case is of particular interest, and it will be considered here, 
since it describes the noisiest images. The distribution of interest is, then, char- 
acterized by the density 



f{z-, (a, 7)) = 



2q! z 

7“ (7 -I- 



2az 

7(1 -I- ’ 



z > 0, 



( 2 ) 



with —a, 7 > 0. This distribution will be denoted C/^(a,7), whose cumulative 
distribution function is given by 

F(z; (a, 7 )) = 1 - (1 zV 7 )“ ■ (3) 



Several parameter estimation techniques are available, being the most re- 
markable ones those based on sample moments and maximum likelihood. The 
fc-th order moment of the (a, 7) distribution is given by 



E{zY 



' k /2 kr{k/ 2 )r{-a-k/ 2 ) 
< 2 r(-a) 

00 



if —a > k/2 
otherwise. 



(4) 



The maximum likelihood estimator oi 9 = (0,7), based on the observations 
zi, Z2 , . . ., ^AT, is defined as the value 9ml which maximizes H^i fe{zi), or equiv- 
alently as the value 9ml which minimizes — ln/e(zi). Equating to zero the 
derivates of this function, we get 



^s(z*;6») = 0, (5) 

where s(z;6») = {si{z]9), S2{z\9))^ = ^ In fg{z) = In fg{z), ^ In fe{z))^ 
denotes the vector of likelihood scores. Explicitly, the score functions are: 

(si{z;9)= 4-hln(l-hY): 

I ( 6 ) 

[s2{z-,9)= 



From equations (5) and (6), following [MFJBOl], we derive, for the single 
look case, the ML-estimator 9ml = {oi-ml,1ml) as : 
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3 Robust Estimators 

As previously seen, the parameter a of the Ga distribution is defined for negative 
values. For near zero values of a, the sampled area presents very heterogeneous 
gray values, as is the case of urban areas. As we move to less heterogeneous areas 
like forests, the value a diminishes, reaching its lowest values for homogeneous 
areas like crops. This is the reason why this parameter is regarded as a roughness 
or texture parameter (recall figure 1). 

Corner reflectors can be considered as additive outliers in SAR imagery, as 
physical equipment in the sensed area that return most of the power they receive. 
The image in these areas is dominated by the biggest possible values admitted 
by the storage characteristics, and their effect is typically limited to a few pixels. 
Corner reflectors are either placed on purpose, for image calibration, or due to 
man-made objects, such as highly reflective urban areas, or the result of double- 
bounce refiection [OQ98]. 

In the reality, it is necessary to use procedures that behave fairly well under 
deviations from the assumed model, these procedures are called robust. One 
of the best known classes of robust estimators are M-estimators, which are a 
generalization of the ML-estimators [AGVOl]. In this work, we use them to 
estimate the parameters of the Ga distribution. These estimators, based on a 
sample zi, Z 2 , ■ ■ ■, zn, are defined as the solution 0m of the estimation equation 



Equation (8) is a generalization of the maximum likelihood equation (5). '0 
is a composition of functions of the score function (6) and the Huber’s function 
given by ipbiv) = min{6, max{y, — 6}}, where b is called tuning parameter. The 
importance of the 0 functions is that they truncate the score of the influential 
observations in the likelihood equation. Many theoretical results concerning the 
asymptotic and the robustness properties of M-estimators are available in the 
literature [AGVOl], [BLF02], [RV02]. On the other hand, it is possible consider 
M-estimators with asymmetrical influence functions [AFGP03], which depend 
on underlying distributions. 

With the purpose of obtain unbiased and optimal estimators, we redefine the 
M-estimator 6m as a solution of the equation 



N 




( 8 ) 



N 



^0[s(zi;6») - c] = 0, 



(9) 
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where the Fisher consistency is accomplished by means of the c function, which 
is defined implicitly as 



The rule for determining the tuning parameter 6, is to require the asymptotic 
relative efficiency of the M-estimator, with respect to the ML-estimator in the 
model without outliers, ranges from 90% to 95% [MR96]. 

4 Simulation Study 

A Monte-Carlo study is performed in order to assess the behavior of the robust 
M-estimator with respect to ML-estimator. It is considered that each sample 
is contaminated by a fraction e of outliers of magnitude v. Hence, a sample 
z\, Z 2 , ■ ■ ■, zn obey the following data contamination model: 



where Sy{z) = with v a very large value as compared to most of the 

sample data, which is chosen as a factor of the sample mean. 

A numerical comparison is made over R = 1000 different samples generated 
by means of (11). Using (4), the parameter 7 depends on a given value for a 
through E{Z) = 1. The methodology used to compute the estimates was that 
described in [MR96]. 

Tables 1 and 2 show, for both the ML-estimator and the M-estimator, for sev- 
eral values of the roughness parameter a = {—1, —6, —10}, the sample mean and 
the mean square error, defined as E[(^\ = J2i=i mse[a] = E[a — 

respectively, where a is the true value of the parameter and a is its estimator. The 
simulation study considers the estimates in several situations, varying the sam- 
ple size N = {9, 25, 49, 81} and the contamination level e = {0%, 1%, 5%, 10%}. 
Also, the outliers were considered as a factor of the sample mean of = 15. 

The results in the tables show that both ML and M estimators exhibit almost 
the same behavior when the sample is exempt of contamination. Besides, when 
the sample size grows both methods show better estimates. Nevertheless, when 
the percentage of outliers increases, the ML-estimators lose accuracy faster than 
M-estimators. Summarizing, M-estimators show either equal or better perfor- 
mance than ML-estimators in all cases. 

5 Conclusions 

In this paper different estimators were used to estimate the roughness parameter 
a of the Q'^ distribution for the single look case. In a Monte-Carlo study, classical 
ML-estimators were compared with robust M-estimators, where the latter were 
better performance than the former in all considered situations, as varying the 
sample size and varying the contamination level. 




( 10 ) 



F{z', (a,7);e;'*^) = ~ F{z; (a,j)) + eSy{z), 



( 11 ) 



Robust Estimation of Roughness Parameter in SAR Amplitude Images 135 

Table 1. Numerical comparison of the mean between ML and M estimators, for varying 
a, sample size and contamination level e, with v = 15. 
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a = 


-6 


a = - 
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E[&m] E[&ml] 
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EI&m] 




9 


-1.162 


-1.140 


-6.508 


-6.507 


-9.997 


-9.997 


0% 


25 


-1.048 


-1.041 


-6.265 


-6.264 


-10.295 ■ 


-10.295 




49 


-1.013 


-1.004 


-6.114 


-6.114 


-10.175 ■ 


-10.175 




81 


-1.014 


-1.012 


-6.060 


-6.060 


-10.123 ■ 


-10.123 




9 


-0.682 


-0.920 


-1.818 


-2.801 


-2.298 


-3.343 


1% 


25 


-0.837 


-0.943 


-3.245 


-4.355 


-4.432 


-5.957 




49 


-0.894 


-0.957 


-4.042 


-4.937 


-5.961 


-7.379 




81 


-0.922 


-0.967 


-4.464 


-5.190 


-6.808 


-8.036 




9 


-0.668 


-0.909 


-1.691 


-2.592 


-2.130 


-3.080 


5% 


25 


-0.767 


-0.900 


-2.701 


-3.787 


-3.695 


-5.080 




49 


-0.796 


-0.905 


-3.112 


-4.146 


-4.286 


-5.700 




81 


-0.802 


-0.908 


-3.156 


-4.183 


-4.346 


-5.771 




9 


-0.638 


-0.886 


-1.553 


-2.365 


-1.957 


-2.798 


10% 


25 


-0.701 


-0.861 


-2.147 


-3.111 


-2.835 


-3.975 




49 


-0.681 


-0.830 


-2.136 


-3.110 


-2.877 


-4.066 




81 


-0.666 


-0.814 


-2.068 


-3.052 


-2.752 


-3.941 



Table 2. Numerical comparison of the mean square error between ML and M estima- 
tors, for varying a, sample size and contamination level e, with v = 15. 





a = 


-1 


a = 


-6 


a = - 


-10 


£ N 


mse[&ML] 


mse[dM] 


mse[aML] 


mse[dM] 


mse[aML] 


mse\&M] 


9 


0.218 


0.218 


5.316 


5.320 


6.036 


6.036 


0% 25 


0.046 


0.052 


1.647 


1.647 


3.636 


3.636 


49 


0.021 


0.024 


0.782 


0.782 


2.189 


2.189 


81 


0.014 


0.016 


0.444 


0.444 


1.415 


1.415 


9 


0.114 


0.078 


17.546 


10.494 


59.377 


44.509 


1% 25 


0.045 


0.042 


7.815 


3.225 


31.439 


17.195 


49 


0.026 


0.024 


4.219 


1.637 


17.194 


8.014 


81 


0.016 


0.014 


2.777 


1.051 


11.325 


4.962 


9 


0.124 


0.072 


18.690 


12.054 


62.132 


48.426 


5% 25 


0.074 


0.045 


11.444 


5.793 


40.882 


26.000 


49 


0.058 


0.031 


9.027 


4.165 


34.517 


20.700 


81 


0.050 


0.022 


8.690 


3.835 


33.603 


19.612 


9 


0.148 


0.078 


19.958 


13.810 


64.993 


52.696 


10% 25 


0.110 


0.052 


15.480 


9.404 


52.624 


38.525 


49 


0.114 


0.047 


15.378 


8.976 


51.837 


36.880 


81 


0.120 


0.046 


15.725 


9.086 


53.149 


37.672 



As concluding remarks, one could say that the distribution is a quite good 
model for SAR data, whose parameters have relevant and immediate physical 
interpretation. Estimators of these parameters can be used in various ways, for 
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instance, as classification and segmentations tools of SAR images or development 
of digital filters, among others. 

In future works, a simultaneous estimation of the a and 7 parameters will 
be considered. Also, M-estimators will be studied for the multilook case. 
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Abstract. In texture segmentation it is key to develop descriptors which 
provide acceptable results without a significant increment of their tempo- 
ral complexity. In this contribution, we propose two probabilistic texture 
descriptors: polarity and texture contrast. These descriptors are related 
to the entropy of both the local distributions of gradient orientation and 
magnitude. As such descriptors are scale-dependent, we propose a simple 
method for selecting the optimal scale. Using the features at their opti- 
mal scale, we test the performance of these measures with an adaptive 
version of the ACM clustering method, in which adaptation relies on the 
Kolmogorov-Smirnov test. Our results with only these two descriptors 
are very promising. 



1 Introduction 



In the past, there have been many approaches to texture description: Gabor 
filters [6], quadrature filters [7], co-occurrence matrices [8], wavelets [9], second- 
order eigenstructure [10], and so on. As texture is not a pointwise feature but 
relies on a local neighborhood, there are two key problems to consider: (i) Find 
a good descriptor, like the ones listed above, and (ii) determine the optimal size 
of the neighborhood where such a descriptor is computed. In this paper, we ad- 
dress these two questions starting by revising two measures, polarity and texture 
contrast, which rely on the second-order eigenstructure. Later, we redefine such 
measures in terms of entropy and propose a way of automatically selecting the 
optimal scale of the measures. Finally, we test these measures in segmentation. 

The polarity P„ at a given pixel is defined in [1] as a measure of the extent 
to which the gradient vectors V/ in a certain neighborhood defined by the scale 
<j all point in the same direction: 



\E+-E_\ 
E+ + E 



where E+ and are defined as follows 

A+ = 5^G,(x,2/)[V/-n]+ (2) 

x,y 



and 



A. Sanfeliu and J. Ruiz-Shulcloper (Eds.): CIARP 2003, LNCS 2905, pp. 137—144, 2003. 
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E_ = Y,G,{x,y)[VI-n]_ , (3) 

where G'o-(-) is a Gaussian smoothing kernel with variance cr^ , [•]+ and [.]_ are 
the rectified positive and negative parts of their arguments, and h is a unit vector 
perpendicular to 4>, the dominant direction in the neighborhood, which in turn 
is the argument of the principal eigenvector of the second-moment matrix 

M<, = ^G.(o;,y)(V/)(V/)^. (4) 

x,V 

Consequently, and E- measure, respectively, how many gradient vectors in 
the window defined by Gc(.) are in the positive side and negative side of (f>, and 
Pa- G [0,1], will be close to zero when E+ Ri E-, that is, when we have a flow 
pattern; and it will be close to the unit for instance when E- Ri 0 and E+ yf 0, 
that is, when we have an edge. 



2 Entropy-Related Measures 

2.1 Probabilistic Polarity 

As the underlying idea of polarity is to vanish as many different orientations ap- 
pear in the neighborhood, we propose an alternative definition of polarity which 
does not rely on the eigenstructure of the local gradient, but on the structure of 
the distribution of local gradient orientations. Thus, the probability Pa{z) of a 
given orientation z € [0, 27 t) at scale a is defined by 

where 

W{x,y) = Ga{x,y)\\yi\\ ,9{x,y) = arctan Jy/Jj, , 

that is, the weight of a given pixel in the neighborhood and the local orientation 
of its gradient, respectively. After quantizing the interval [0,27 t) into N bins of 
size A = 2tt/N, we define the empirical probability ha{k), with fc = 0,l,...,iV — 
1, which accumulates all probabilities Pa{z) G [fcA, (k + 1)A). Using the latter 
A— component histogram, the entropy of the distribution is approximated by 

N-l 

Ifa=J2 ha(k) log ha(k). (6) 

k=0 

In principle, the inverse entropy 1 — ffa is ^ good measure of polarity because 
it tends to the unit when all gradient vectors in the neighborhood have a sim- 
ilar orientation (minimal entropy, corresponding to a peaked distribution) and 
it vanishes when many different orientations appear (maximal entropy, corre- 
sponding to a uniform distribution). However, the inverse entropy decays too 
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Fig. 1. (a) Input image (b) Zoom showing both a polarized and a de-polarized zone 
(c) Histogram of the polarized zone: 1 — = 0.2568, P„ = 0.2568 (d) Histogram of 

the de-polarized zone: 1 — Ha = 0.3722, Pa = 0.0079 

slowly as the neighborhood is de-polarized. Particularly, a distribution with two 
closer peaks (or one wider peak) and one with the same peaks but distant, have 
similar entropies. Consequenly, the latter measure captures the number of peaks 
but not their separation, and such a separation, in addition to the appearance of 
new peaks, occurs when we progressively de-polarize a texture edge while incre- 
menting the size of its neighborhood. For instance, Fig. 1 polarity vanishes when 
two significan peaks appear in a de-polarized zone, whereas inverse entropy even 
gets incremented in the same zone. 

In order to capture peak separation we re-define polarity in terms of the 
expression 

N-l 1(N-1)/2\ 

Pa = l-^ha{k) ^ g{r)hai[k + r]N) , (7) 

fc=0 r=-[7V/2j 

where each component ha{k) is no longer weighted by its logarithm but by the 
result of convolving it with a kernel g{.) defined in such a way that we ensure 
that Pa G [0, 1]. For a linear choice we have that 
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g{r) = ar = 



L(JV-1)/2J 
[N/2\ 



Ell-- 



( 8 ) 



We also assume a cyclic histogram, because the orientation domain is also cyclic, 
where [k + € {0, 1, . . . , — 1} refers to {k + r)mod N . 



2.2 Probabilistic Texture Contrast 

Another texture feature is texture contrast. In [1] it is defined by 2y/\i + A 2 , 
where Ai and A 2 are the two eigenvalues of the second-moment matrix M„. Fol- 
lowing the probabilistic rationale above, and in order to define texture contrast 
we consider the local intensity probabilities 



qa{z) = 






(9) 



and proceed to quantize the normalized range of intensities [0, 1] yielding the 
M— component histogram Ca(i), i = — 1, which accumulates the 

probabilities qa{z). Texture contrast must be close to the unit when we have 
two peaks at maximal distance, must vanish when the intensity distribution is 
peaked. Again, we find that the entropy is not a proper choice and we replace it 

by 

M-1 M-1 

Co- = X! X! , (10) 

2 — 0 j — 0 

where d{.) is defined in such a way that £ [0, 1]. For the simple linear case, 
we have that d{r) = 2r/{N — 1). With the latter definition we consider peak 
separation through a weighted correlation. 



3 Scale Selection 

As the probabilistic measures defined above depend on the scale, we are in- 
terested in a method for selecting them optimal scale for both of them. Few 
previous work has been done in this area [2]. However we follow the approach 
described in [1] and scale selection relies on polarity analysis. We will consider a 
sequence of scales {uk}, with fc = 0, 1, 2, . . . , 5” and we will start by computing 
the polarity at the lower scale P„g and assuming that the tentative optimal scale 
is (To- Thereafter, we will test whether an increment of scale is acceptable. 

An increment of scale will always contribute either to de-polarize the pixel, 
or to leave its polarity unchanged. Consequently, given and assuming that 
the temporary optimal scale is ak, we compute and test whether the 

decrement Pcr^ — Pa^-i = ^P<Tk < 0 is low enough. If WP„^^^ < vS/Pa,., with 
h' £ [0, 1] then we will accept ak+i as a new temporary optimal scale because 
such scale de-polarizes the pixel significantly. Otherwise, we will assume that 
the optimal scale a* = ak- The coeffiect ly modulates the decrement needed to 
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increment the scale: When — >■ 0 we will change easily of scale, whereas with 

^ 1 we will be more restrictive. 

In our experiments we have the set of scales {0.25,0.5, 1.0, 2.0, 4.0, 8.0} and 
we have set v = 0.5, that is, we set S' = 6. In Fig. 2, we show the polarity 
at those scales, and in Fig. 3 we show some results of optimal scale selection: 
Optimal-scale image, with dark greys corresponding to low scales and light greys 
corresponding to high scales, polarity image at the optimal scale (each pixel with 
its optimal polarity Pa*), and texture-contrast image at the optimal scale (each 
pixel with its optimal texture contrast Pa*)- Low polarity appears in light grey 
and high polarity appears in dark grey. On the other hand, low texture contrast 
appears in dark grey and high contrast appears in light grey. 

4 Adaptive Segmentation 

4.1 EM Algorithm for Asymetric Clustering 

Given N image blocks each one having associated M possible features 

yi, . . . , yM, the Asymetric Clustering Model (ACM) maximizes the log-likelihood 

N K 

L{I,q) = - EE P{Pj\i, Qj\a) , (tl) 

i=l a—1 

where: encodes the individual histogram, that is, the empirical probability of 

observing each feature yj given is the prototypical histogram associated 

to one of the K classes c^; KL {., .) is the symmetric Kullback-Leibler divergence; 
and lia G {0, 1} are class-membership variables. 
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Fig. 3. Texture features, (a) Optimal-scale image (b) Hue component (c) Pa* image 
(d) Ca* image 



The following EM algorithm was proposed in [3] [5]. The E-step consists of 
estimating the expected membership variables lia G [0, 1] given the current 
estimation of the prototypical histogram qj\a- 






Pg exp{-A:L(pj|„g,|a)/r} 
Ef=i pI exp{-KL{pjii, qi\fj)/T} 



(12) 



where 



1 

N 



'V/* 



i=l 

that is, the probability of assigning any block Xi to class Cg at iteration t, 
and T the temperature, a control parameter which is reduced at each iteration. 

In the M-step, given the expected membership variables the prototyp- 
ical histograms are re-estimated as follows: 



N 

= X! > where 

2 = 1 



%OL 



2^k^l ^koi 



(13) 
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Fig. 4. Segmentation results. Top: Input images; Middle: Only texture features; Bot- 
tom: Including color features. 



that is, the prototype consists of the linear combination of all individuals 
weighted by the TTia- 

In a previous work we have introduced an adaptation mechanism for ACM 
[4] where we start with a high number of classes and class-fusion relies on con- 
sidering whether the dispersion of the resulting class is lower than the sum of the 
dispersion of the two fusing classes. Herein we propose a method relying on the 
Kolmogorov-Smirnov test. As in our early work, we assume that the iterative 
process is divided in epochs, and our adaptation mechanism consists of starting 
by a high number of classes Kmax and then reducing such a number, if proceeds, 
at the end of each epoch. At that moment we consider all the K{K — l)/2 pairs 
of prototypes, where K is the current number of classes. For all these pairs we 
compute the Kolmogorov-Smirnov statistic with a = 0.05, resulting from com- 
paring their histograms, and then we select the pair qj\a and qj\jj with the lower 
statistic. If with such a statistic, the test does not succeed (both histograms are 
not different enough) we decide to fuse their classes. 

Then, we compute the fused prototype qj\^ by applying Equation 13 and 
considering that Ii~^ = lia + /i/ 3 , that is 

N 

And then a new epoch starts, and proceeds until convergence is declared. 
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4.2 Segmentation Results 

Dividing our input color images in blocks of 8 x 8 pixels, we consider a histogram 
of 16 components both for polarity, texture contrast and hue component, and 
thus number of features is 32 when only texture is considered, and 48 when color 
is included. In Fig. 4 we compare the segmentation results obtained with and 
without color information, and assuming K^ax = 10- In many cases texture fea- 
tures are enough for yielding acceptable segmentations, although color features 
usually improve the quality of the results. 

5 Conclusion 

In this paper we have proposed two entropy-related texture features, obtained 
through automatic scale selection. In order to demonstrate their utility in seg- 
mentation we have used them in an adaptive version of the ACM clustering 
model, and the obtained results were promising. 
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Abstract. In this paper we propose a topological* model for image database 
query using neighborhood graphs. A related neighborhood graph is built from 
automatically extracted low-level features, which represent images as points of 
ffi'’ space. Graph exploration correspond to database browsing, the neighbors of 
a node represent similar images. In order to perform query by example, we de- 
fine a topological query model. The query image is inserted in the graph by lo- 
cally updating the neighborhood graph. The topology of an image database is 
more informative than a similarity measure usually applied in content based im- 
age retrieval, as proved by our experiments. 



1 Introduction 

The information retrieval in image databases is still a challenge due to the fact that 
frequently the users seek semantically similar images while an image database pro- 
vides similarity only at low level, by using characteristics computed from pixel val- 
ues. Visual information retrieval implies the use of an index. There are two ap- 
proaches to image indexing [9]: visual content based and annotation based. Visual 
content indexing supposes that the visual information of each image (given by pixel 
values) is resumed to a feature vector containing low-level features (color histogram, 
textural features, form features). Consequently, the query process is reduced to neigh- 
bors research inside the representation space [1]. A similarity measure is defined to 
identify the neighborhood. In this context, the query will start with a sample image. 
Annotation based indexing suppose that each image is annotated by using a keyword, 
a label, more generically a text. Each image is described by keywords, expressing the 
image semantic. The user searching an image having a certain semantic can express 
his request as a list of keywords. A similarity measure can also be useful to identify a 
set of images expressing the query semantic. 

In this paper we are interested in the concept of “neighbor - neighborhood” of an 
image in an image database. Most of the search algorithms in image databases pro- 



* The word “topology” denotes here the relationships between elements linked together in a 
system. It is not used neither in terms of mathematical study of the geometric properties of 
figures that are independent of size or shape and are preserved through deformations, twist- 
ing and stretching, nor in terms of family of subsets (family of all open subsets of a mathe- 
matical set, including the set itself and the empty set, which is closed under set union and fi- 
nite intersection) [12]. 

A. Sanfeliu and J. Ruiz-Shulcloper (Eds.): CIARP 2003, LNCS 2905, pp. 145-152, 2003. 
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pose to seek the k nearest neighbors (kNN) [7] of an image by using a similarity 
measure [12]. For instance, the QBIC system [3], in its implementation for the Her- 
mitage Museum^ always returns 12 nearest neighbors of the sketch presented by the 
user as query in the color search or layer search engine. In some situations, as illus- 
trated in the section 2, the kNN algorithm produces surprising results compared to 
user expectations. In section 3, we introduce a more appropriate neighborhood repre- 
sentation model: the topological neighborhood. Section 4 describes the topological 
query model. We will discuss its advantages and its limitations. In section 5 we pres- 
ent experimental results on an image database where the two query methods (kNN 
and topological neighborhood) are compared on the basis of recall and precision indi- 
cators. Concluding remarks and future works are presented in section 6. 



2 Anomalies and Assumptions of the kNN 

In the context of navigation in an image database, the system is typically driven by 
the user. Then, it would be more convenient for him that the system follows the hu- 
man cognitive model. Since the user expects to see together sets of similar images, the 
system must guarantee a stability of those sets and we assume that the symmetry is 
one of the required conditions. Unfortunately, in some situations the kNN model does 
not verify this condition. We will illustrate this by an example based on a 2NN algo- 
rithm. Given the 6 images in Figure 1, their distance matrix, based on LI color feature 
(sum of image pixel values) is computed in Table l.a. 




A B C D E F 



Fig. 1. Images list for 2NN example 



Table 1. The distance matrix (a), the two nearest neighbors in the case of 2NN algorithm (b) 
and the geometrical neighbors (c). On each row, the black cells indicate the elements (columns) 
neighbors of the row element 
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(a) (b) (c) 



The user runs a query giving the image D as request. The system returns the images B 
and C (Table l.b). Then, the user expects to find D at least when he runs the query 
with the image C as request. However, the system returns A and B. These query re- 
sults are then surprising and even doubtful for the user. The property of non- 
symmetry of the kNN have been underlined several times, but seldom criticized as 
leading to cognitively erroneous results. 



^ http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicSearch.mac/qbic?selLang=English 
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To avoid kNN to produce this adverse effect, the points have to be relatively uni- 
formly distributed on the representation space. This assumption seems too strong for 
us and difficult to keep. We can manage this limitation by using the topological mod- 
els which are symmetric (Table 1 c). 



3 Topological Models 

We consider a dataset Q composed by n images. Each image is represented as a /t 
dimensional numerical feature vector = p{i))eW ■ 

Therefore each image is a point in space. A distance measure can be computed 
for each couple of images, for instance Euclidian or Cosine distance. 

The topology of a dataset defines how data points are connected to one another. 
Topology can be represented by a graph, where data points X. forye [l,...,p] repre- 
sent nodes and the neighborhood relationships denote edges connecting nodes. 

Each image is represented as a node in a neighborhood graph. The neighbors of a 
node represent similar images. Two points (images) are neighbors and connected by 
an edge if they verify a specific binary relationship. Many models may be used: De- 
launay triangulation, relative neighborhood graph, Gabriel graph or minimum span- 
ning tree. We choose the relative neighborhood graph representation for the image 
database for the reasons presented below. The binary relationship defined by each 
graph is symmetric. For more details about all the graph models and their properties 
see [8]. 

Relative Neighborhood Graph is a related graph where two points a and (3 are 
neighbors if they verify the following property: the lune, corresponding to the dashed 
area in Figure 2, must be empty. Two points a and [3 are connected by an edge if the 
following equation is verified: 

d{a, p) < Max{d{a,y),d{l5 ,y)), Vye Q.\y^a,p (1) 

where d{a,b) is a distance between a, be Q. inM^ . 




Fig. 2. Example of a Relative Neighborhood Graph in 
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3.1 Topological Neighborhood 

Content based image retrieval is based on similarity measures. In this context, image 
retrieval requires an understanding of the notion of image neighborhood. The neigh- 
borhood is generally defined as the set of all points belonging to a given set whose 
distances from a given point are less than a given positive number. In order to decide 
which points are neighbors, the geometrical neighborhood or the topological neigh- 
borhood have to be examined. A geometrical neighborhood includes all the points 
within a certain distance from the sample point. kNN algorithm uses the geometrical 
neighborhood. A topological neighborhood contains all the points within a certain 
number of edges from the desired sample point. Two points are linked by an edge 
only if they satisfy the criteria presented before and does not necessarily involves a 
minimal distance. 



3.2 Choice of the Topological Neighborhood 

All neighborhood graphs model the similarity between images represented as points 
in M'’ space. The relative neighborhood graph is a superset of minimal spanning tree 
and a subset of Delaunay triangulation [11]. We preferred the relative neighborhood 
graph because the definition of minimal spanning tree and for Delaunay triangulation 
is a global one. Therefore, each time we add a new image on the database, or we pres- 
ent a new image in order to perform similar image retrieval, the entire graph needs to 
be recalculated. Relative neighborhood graph and Gabriel graph are equivalent, they 
both have local definitions and we can easily insert new points in the graph, without 
redefining it. The Gabriel graph has more edges than relative neighborhood graph but 
in practice the computing time is equivalent. In our tests we used the relative neigh- 
borhood graph. 

The related neighborhood graph is built in K'' space For p >3 the visual represen- 
tation can be projected into the space using, for example, the principal compo- 
nents analysis or phylogenic trees [5]. This representation is an alternative to Koho- 
nen maps [6]. It can also be directly built from a given plane, like the first factorial 
plane if this one preserves the major part of information. 



4 Topological Query 

Given a query image I, and an image database, the user wants to find a set of similar 
images. When applying A:NN algorithm, k images are returned, closest in distance 
with the query image I. Setting the value of k represents another drawback of the kNN 
approach, in addition to the non-symmetry. Each image has a variable number of 
neighbors, therefore we can distinguish two situations: either to limit the returned 
results at k items if the query image has more than k neighbors in the target database 
or to force the system to return k images if the query image has less than k neighbors 
in the target database. 

Table 2 shows the distance matrix for eight images (presented in Figure 3) belonging 
to three semantic categories (“plants”, “fountain” and “mountain”). If we use a query 
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image similar to A (“plants”), intuitively the system should return A, B and C. If we 
set the value of k to 2, we will obtain only two results, say A and B. Another query 
has to be performed to obtain the C image, even if it is very similar with the two oth- 
ers. On the other hand, if k value is set to 3 and the query image is similar to D 
(“fountain”), the system will return D, E and also another image whose distance is 
closest from the query image. The distance matrix in our example shows H as the 
second nearest neighbor of the D image, therefore the user will obtain a “mountain” 
picture, visually not similar with the query image representing a fountain. 




Fig. 3. Image list for three semantic categories; plants (A, B and C), fountain (D and E) and 
mountain (F, G and H) 



Table 2. Distance matrix using the Euclidean distance and 15 color and textural features pro- 
jected on the two first principal components 



Distance 


A 


B 


c 


D 


E 


F 


G 


H 


A 


- 
















B 


2.35 


- 














C 


2.09 


0.50 


- 












D 


15.44 


13.91 


14.42 


- 










E 


14.73 


13.32 


13.82 


1.11 










F 


8.54 


9.11 


9.39 


11.95 


10.88 


- 






G 


10.44 


10.36 


10.75 


9.19 


8.08 


3.10 


- 




H 


10.59 


10.19 


10.62 


7.80 


6.71 


4.25 


1.41 


- 



The neighborhood graph we propose is build as in Figure 4-a by using color and 
textural features of images. 

A visual exploration of the neighborhood graph shows that in this example the 
three clusters correspond to semantic classes. Distances between nodes in the neigh- 
borhood graph are a good representation of the visual similarity between images. 

When a query image I is presented to the system, the following algorithm is ap- 
plied: 

1. Calculate the p low-level features corresponding to the representation space ; 

2. Calculate the distances^ between the new point and all the existing points; 

3. Insert the new point in the neighborhood graph by verifying the criteria specified in 
section 3. 

The advantages of this method are: first, the neighborhood graph is locally updated, it 
is not entirely rebuild. Second, the user can browse the neighborhood graph and parse 
all the neighbors of the query image. The number of neighbors is not fixed as in the 



^ We considered the Cosine distance, but other distances may be used as well. 
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case of kNN algorithm where exactly k points are returned as query result. Third, the 
neighborhood relationship defined by RNG is symmetrical and more appropriated for 
the browsing process, as shown in section 2. 



m M m a a 

A B C D E 




Fig. 4. The relative neighborhood graph. The nodes have different colors representing semantic 
categories. On left (4-a), without the Query Image ; included on right (4-b) 

Figure 4-b presents an example of topological query. The new image is inserted in 
the neighborhood graph and the user can browse all of its neighbors. A new mountain 
image is used as query image (it does not belong to the database, but very similar with 
the other mountain images in the image database). 



5 Experimentation 

In our experiments we used a set of 259 images divided between six main categories, 
extracted from the Ground Truth Database"* (University of Washington). We used 
predefined image categories as semantic information. We considered the following 
categories: “Arborgreens”, “Australia”, “Cherries”, “SwissMountains”, “Greenlake” 
and “SpringFlowers”. We used two categories of features automatically extracted 
from images: color features (normalized LI and L2, predominant color) and textural 
features (the 14 features defined by Haralick in [4]). The numerical features may 
represent the whole image or objects inside the image. The features we used here are 
all global features. In a future work we will perform image segmentation and also use 
shape features on segmented regions. 



5.1 Comparison Protocol 

For a given representation space, we compare the coherence of a neighborhood in a 
related neighborhood graph (RNG) with the coherence of k nearest neighbors (kNN) 
of an image. To evaluate this coherence, RNG and kNN are compared in a classifica- 
tion context. Classification performances are usually measured in the term of the 



"* http://www.cs.washington.edu/research/imagedatabase 
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classic information retrieval notions of recall and precision [10]. Test images are 
spread over six semantic categories used to compute recall and precision. We specify 
that the relative neighborhood graph is built from low-level features only. 

The recall and precision are defined as follows: 

mnber cf ategories fond end cared mmixr of categories found and correct 

reodl= predsiai = 

totd nnber cf categories cared told mmixr cf categories famd 

We evaluate the capabilities of the graph used as a classifier, in order to find the 
category of the query image. To do that, the query image is inserted in the graph 
structure and the category is decided by its neighbor votes inversely weighted by the 
length of the edges. The length of the edges represent the distance between two linked 
nodes. Even if the nodes are neighbors in terms of distances, in RNG two nodes are 
linked by an edge if they verify the condition (1) presented in section 3. For the kNN 
classifiers, the category is decided by the votes of k nearest neighbors of the unknown 
image, inversely weighted by the distance. In our tests we have used the Cosine dis- 
tance since it is scale invariant and then does not require to normalize the data. The 
number of neighbors (k) for kNN vary from 1 to 5 and classifier results are compared 
after a 10 folds cross validation. 



5.2 Results and Discussion 

The Figure 5 shows the results obtained from our experiments. We observe that RNG 
outperforms the different kNN models. For kNN we have performed tests up to 30 
neighbors (k=30). Our results show that the success rate, the precision and recall in 
the case of RNG are superior of the best kNN results, obtained for 4NN. We can ex- 
plain the better results for RNG by examining the category prediction process. In the 
case of kNN, exactly k neighbors will vote the query image category. In the case of 
RNG, the number of neighbors is adaptative according to the topology. 




Fig. 5. Relative Neighbor Graph versus k-Nearest Neighbor (k varies from 1 to 5) 



6 Conclusions and Future Work 

Search algorithms in image databases usually return k nearest neighbors (kNN) of an 
image according to a similarity measure. This approach presents some anomalies and 
is based on assumptions that are not always satisfied. We have examined the causes of 
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these anomalies and we have concluded that image query models have to exploit 
topological properties rather than the similarity degree. The knowledge inside an 
image database lies on the topological structure of a set of points (images) rather than 
on the distance between them. We have proposed a topological representation method 
based on neighborhood graphs built on automatically extracted image features. On the 
other side, MPEG 7 standard is set up gradually and query models could be built on 
MPEG 7 descriptors of multimedia data. Automatic extraction of semantic descriptors 
of audio-visual content still remains a problem, as well as their pertinent exploitation. 
The topological model proposed in this paper offers an exploratory analysis of MPEG 
7 descriptors and also allow to use these descriptors in a query process. In a future 
work we will use MPEG 7 files as data source for our topological model. We also 
work on a faster retrieval algorithm using related neighborhood graphs. 
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Abstract. 3D shapes can be reconstructed from 2D silhouettes by back- 
projecting them from the corresponding viewpoints and intersecting the 
resulting solid cones. This requires knowing the position of the viewpoints with 
respect to the object. But what can we say when this information is not 
available? This paper provides a first insight into the problem, introducing the 
problem of understanding 3D shapes from silhouettes when the relative 
positions of the viewpoints are unknown. In particular, the case of orthographic 
silhouettes with viewing directions parallel to the same plane is thoroughly 
discussed. Also we introduce sets of inequalities, which describe all the 
possible solution sets and show how to calculate the feasible solution space of 
each set. 



1 Introduction 

A central problem in computer vision is understanding the shape of 3D objects from 
various image features. Many algorithms are based on occluding contours or 
silhouettes. The main approach is volumetric, and consists in building the volume R 
shared by the regions Q (see Fig. 1) obtained by back-projecting each silhouette Sj 
from the corresponding viewpoint. This simple reconstruction technique is called 
Volume Intersection (VI) (see [1], [4], [8], [9], [10]). It requires the 3D positions of 
silhouettes and viewpoints. However, in several practical situations this information is 
not available and therefore VI cannot be performed. Even if this simple reconstruction 
technique is not possible, we would like to get the best of the available information. 

Before entering the problem, we briefly review some definitions relevant to our 
problem. First, the concept of visual hull of an object [6], which is the object that can 
be obtained by VI using all the viewpoints that belong to a viewing region completely 
enclosing the original object without entering its convex hull. It is also the largest 
object that produces the same silhouettes as the given object. A point of the surface of 
the reconstructed object R is an hard point [6] if it belongs to any object that produces 
the same silhouettes from the same viewpoints. The concept of hard point allows 
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stating a necessary condition for the reconstruction to be optimal, and is at the basis of 
interactive VI algorithms [3]. 

In the following, for brevity, we will use the expression “set of silhouettes” to 
specify a set of silhouettes together with the position of the corresponding viewpoint 
with respect to each silhouette. These data, allow constructing a solid cone for each 
silhouette, but not positioning the cones in the 3D space. To understand how the 3D 
shape is related to such a set of silhouettes, two main questions can be considered. 
The first question is: given a set of silhouettes, does an object exist able to produce 
them? We will call compatible a set of silhouettes if the same object can generate 
them. An object able to produce a compatible set of silhouettes will be said to be 
compatible with the set. The second question is the main practical issue: how can we 
find one or more compatible objects given a compatible set of silhouettes, as that 
produced by a real object? We will present a set of results that provide a first insight 
into the problem. 




Fig. 1. The volume intersection technique 



2 Compatibility of Orthographic Silhouettes of 3D Objects 

In the rest of this paper we will restrict ourselves to consider simply connected 3D 
objects and their orthographic projections. This approximates the practical case of 
objects small with respect to their distance from the camera. The reader is referred to 
[12] for a proof of the statements of this section. 

First, we will investigate the compatibility of two silhouettes. Let S be a 2D 
orthographic silhouette of a 3D object. Let us project orthographically S along a 
direction in the plane of S. The ID silhouette obtained depends on the angle a that the 
chosen direction makes with the x axis of a coordinate system fixed with respect to S 
(Fig. 2). Let L(S,a) be the length of the ID silhouette of S. The following statement 
holds. 

Proposition 1. A necessary and sufficient condition for two orthographic sil- 
houettes Sj and Sj to be compatible is that two angles ttj and exist such that 
L(S„a,)= L(S 2 ,a 2 ). 

What happens when we have to deal with more silhouettes? That is, how can we 
find if three or more silhouettes are compatible? Clearly, we have that: 

Proposition 2. A necessary condition for a set of silhouettes to be compatible is that 
all pairs of silhouettes of the set are compatible. 
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Fig. 2. The ID silhouette L(S,a) Fig- 3- The strip ST(V) and the curve C, 



However, in general, to be compatible in pairs is not sufficient for a set of 
silhouettes to be compatible (see [12]). A necessary and sufficient condition for the 
compatibility of more than two silhouettes can be found considering a property of the 
reconstructed object R. Let us consider one of the silhouettes involved in the process, 
the corresponding viewing direction V and the cylinder circumscribed to the object O 
made of lines parallel to this direction (Fig. 3). Each line of this cylindrical surface 
must share with the surface of O at least one point. These points form a curve C^. 
This curve belongs to an annular surface, a strip ST(V) of variable width, which is 
what is left of the original circumscribed cylinder after the various intersections. 
During the reconstruction process, this annular strip cannot be interrupted; at most it 
can reduce to a curve with zero width. In this case, the curve consists of hard points. 
Therefore we can formulate the following condition for the VI algorithm to be 
feasible: 

Proposition 3. A necessary and sufficient condition for a set of silhouettes to be 
compatible is that it be possible to find viewpoints such that no annular strip of the 
reconstructed object is interrupted. 

In the next sections this condition will be used for constructing algorithms both for 
verifying the compatibility of a set of silhouettes and reconstructing compatible 3D 
objects. 



3 Silhouettes with Viewing Directions Parallel to a Plane 

In this section we deal with a particular case of the general problem, where all 
viewing directions are parallel to the same plane (Fig. 4). Clearly, all silhouettes have 
the same height and the same plane must support all cylinders obtained by back- 
projection. 




Fig. 4. Viewing directions parallel to the same plane 
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Fig. 5. Notations used for a silhouette. 




Fig. 6. (a) A case where Sjfy) is compatible with Sl(y) and S^Cy) in a horizontal plane, (b) The 
condition for the compatibility of the whole silhouettes. 



We consider first the compatibility of three silhouettes Sj, and S3. Each planar 
silhouette is defined, for 0 <y<y^^ by two curves Sj|(y) and (see Fig. 5 ). For 
simplicity, let us consider mono-valued functions. Also let Sj(y)=Si^(y)-S;[(j). Let us 
consider a horizontal plane corresponding to a value of y between 0 and y^^, and its 
intersection with the three cylinders obtained by back-projecting the silhouettes. Let 
us consider in this plane the arrangement of the 2 D silhouettes Sj(y), S^Cy), SjCy) and 
of the viewpoints V|,Vj,V3 shown in Fig. 6(a). It is not difficult to see that proposition 
3 requires that the two lines projecting the endpoints of S3(y) along the direction V3 
must lie inside the two areas highlighted in Fig. 6(a). For the whole silhouettes to be 
compatible, this must hold for all y. For the reconstruction to be possible, S3,(y) must 
lie between the two leftmost curves, in this case the projections of the vertices 3 and 
4 , and Sj^tj) must lie between the two rightmost curves, the projections of the vertices 
1 and 2 . 




Fig. 7. The intersections in a horizontal plane 
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To derive the set of inequalities that define for this case feasible intersection 
parameters, let us inspect in more detail the intersection in a horizontal plane (Fig. 7 ). 
Let 0 |, Oj, O3 be the intersections of the axes y of the coordinate system of each 
silhouette with this plane. Intersecting Sj(y) and Sj(y) requires to fix an angle, let it be 
ttj. Intersecting also S3(y) requires choosing two more parameters: the angle and a 
distance, let it be d (see Fig. 8). d is the distance between two points lying on the line 
projecting Oj along the direction Vj. The first is the intersection of this line with the 
line projecting along V^, and the second is the intersection with the line projecting 
O3 along V3. Thus, to find feasible solutions we must search the 3 -dimensional space 
[ttj, ttj, d\. Let P|(y), PjIt), Pjliy) and P4(y) be the distances from O3 of the orthographic 
projections of the vertices of the parallelogram onto the line supporting 83(3:). The 
compatibility condition for the three silhouettes is expressed by the following 
inequalities: 

S3/yj>P/yj S 3 ,(y)<P 3 (yA 83,(7) >P/yl ( 1 ) 

83,(7) <P,(yj P4(7)>P/7) 

In ( 1 ), the purpose of the fifth inequality is to characterize the case just analysed, 
let it be Case 1 . 8even other cases, determined by the direction of Vj with respect to 
Vj, Vj, and the directions of the diagonals V,_, and V33 of the parallelogram, are 
possible, each producing different sets of inequalities (see Fig. 8). For each case, a 
possible orthographic projection onto the plane of 83 of the edges of the object 
produced by the first intersection is shown with thick lines. The boundaries of 83 are 
the thin lines. 

Four Silhouettes 

Let us consider Case 1 and add a fourth silhouette 84. In each horizontal plane 
8,(7), 83(7) and 83(7) produce a polygon with six vertices and three pairs of parallel 
edges (Fig. 9 ). The new intersection is defined by two more parameters, the angle 
between V, and V4 and the distance < 7 ,, measured, as d, along the line that projects O, 
from V,. 8atisfying the condition of Proposition 3 requires, in each horizontal plane, 
to cut away two opposite vertices, without eliminating completely the edges that meet 
at these vertices. By orthographic ally projecting the six vertices onto the plane of 84 
we obtain six curves. For the new intersection to be feasible, the boundaries 84,(7) and 
84,(7) of 84 must lie in the areas bounded by the two leftmost and the two rightmost 
curves respectively. 
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Various sets of inequalities result, depending on the direction of V^. First, let us 
distinguish two cases (case (a) and (b) in the left of Fig. 9) related to the directions 
which determines the leftmost and rightmost vertices (5 and 7 for case (a) and 7 and 5 
for (b)). In each case we have four suh-cases for the leftmost and rightmost strips 
where and must lie (see right part of Fig. 9). The inequalities corresponding to 
each sub-case are easily written. For instance, for the sub-case aj it is: 

P,(y)<S,,(y) S,fy)<PSy) 

P,iy)<S,^iy) S,fy)<P,iy) 

P,(y)<P,(y) P,(y)<P,(y) 

where Pj(y) are the projections of the points i(y) onto the plane of S^. As before, the 
last two inequalities guarantee that the inner boundaries of these areas are actually 
and Pg. 

Summarizing, each set of inequalities that defines feasible intersection parameters 
for four silhouettes contains 11 inequalities (the five inequalities related to the first 
three silhouettes and six new inequalities also referring to S^). As for the number of 
sets of inequalities, we have 8 cases for three silhouettes, 3 pairs of opposite vertices 
and 8 cases for each pair, and thus 192 sets each containing 11 inequalities. 




Fig. 9. Cases (a) and (b) and the 8 sub-cases 



Five or More Silhouettes 

The previous discussion about the fourth silhouette does hold for any further 
silhouette. In fact, we must always cut a pair of opposite vertices without deleting 
completely the edges converging at these edges. It follows that each new silhouette 
adds two parameters, seven inequalities for each case. Thus, for n silhouettes, the 
number of parameters is 2n-3, and the number of inequalities 6(n-3)-n5 (ri>3). Each 
new silhouette adds 8 sub-cases for each pair of opposite vertices. For the n-th 
silhouette, the pair of vertices are n-1. Let N^(n) be the number of sets of inequalities 
for n silhouettes. For n>3 it is: N/n)=8(n-l)N^(n-l). Therefore we must face an 
exponential growth of the number of cases. 



4 Writing the Sets of Inequalities 

The inequalities discussed in the previous section allow to answer, in a particular 
case, both question raised in the introduction: finding objects compatible with a set of 
compatible silhouettes, and understanding if an (artificial) set of silhouettes is 
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compatible. We have developed an algorithm for automatically writing the sets of 
inequalities, which works on the following basis. In this section we will renumber the 
silhouette starting from S„, and not Sj, in order to handle easily the indices of the. The 
axes of the reference system are aligned with the axis of the projection of S„ on the 
plane. Let’s assume, without loss of generality, that V,, is parallel to the y axis of the 
reference system and the line supporting S„ is parallel to the x axis. The origin of the 
reference system corresponds with the intersection of the projections of 0„ along V„ 
and O, along V, on the plane. The position of the i"^ silhouette is determined by two 
parameters, d. and a., where d. is the distance between the projection onto the y axis of 
the i"^ origin Oj along Y. and the origin (hence d=dj=0), and a. is the angle between 
and V„ (a=0). We assume that the angle is positive if V„xV| has the same verse of 
jcXy; it also follows that Y .=(sen(a),-cos(a)). Let Cj be the vertices of the polygon 
resulting from intersecting S„ and Sj. The equations of the first 4 vertices (Fig. 10) are; 
Cj = (S„,,Sj, /ien(Oj)-S„, / tan(«j)), = (So,,Sj, / 5en(«j) -S„, / tan(«,)) 

Cj = /tan(«j)), / tan(a,)) 

The sets of inequalities previously introduced can be written in terms of the 
distances from the origin along the y axis of the projections of the vertices of the 
parallelogram and of S^, and along the viewing direction of the i* silhouette. For 
each projection, the lines passing through the vertices of the polygons have equations 
C + V.t and their intersections (P.^) with the y axis of the reference system are given by 
P.j = + cytan(or). Now, let d.,, d.^ be the projections on the y axis of Sj, and S,^. It 

follows that: 



~‘^i ‘^ir +S„(3')/ien(aj) 

Projecting the vertices and S, onto the y axis, the verse of the inequalities also 
depends on the value of the angle between the current viewing direction and V„. For 
instance, in the example shown in Fig. 1 1, we have: 

P 21 ^ < P 22 - ^24 - ^2, ^ ^23 . 0<a^<7l 

P21 S ^2, > P22 > P24 S <^2, > P23 , 71 < a ^< 2 n 



In order to be able to write the inequalities in an automatic way, the general form 
of the inequalities can be rewritten multiplying each term by sin( aj. In the previous 
example, the set of inequalities become; 



P2^sin(a2)< d^iSinia^) 
d 2 iSin(a 2 )iP 22 ^in{a 2 ) or 
P22ii«(a2) - P24^in{a2) 
P2tSin(a2) ^ d2^sin(a2) 
d2,sin{a2) i PjjVnjaj) 



sin{a2 )(P2, - ^2, ) < 0 

sin(a2)(d2i - P22) < 0 

Vn(«2)(P22-P24)<0 

sin(a2)(P2t-d2,)<0 

sin(a2)(d2,-P2,)<0 



In general, each term of the inequalities will be multiplied by sin( a). 

Each new vertex C., j>A, is the intersection of the line every edge lies on and the 
specific projection line relative to V,. All these lines are projection lines, and can be 
written as: 



Djj+V,r or D.^+Y.r, where: 

D, = (S„, ,0), i = 0; (0, d, ),i>0 D,, = (S„, ,0), i = 0; (0, d,, ), i > 0 
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Fig. 10. 



Fig. 11. 



5 Solving the Inequalities 

A set inversion technique ([7]) has been applied for finding the feasible solution set S 
of the set of non-linear inequalities that characterizes each sub-case. This technique 
performs a paving the parameter space with boxes. If the current box [p] is proved to 
be inside S, then the box is kept as part of the solution space. If it is proved to have an 
empty intersection with S then it is discarded. Otherwise, [p] is bisected except if its 
width is smaller than a defined threshold. The dimensionality of the initial box is 
equal to the number of variables involved in the set of inequalities. To prove that a 
given box [p] is inside S, interval computation ([13]) has been used. The technique 
illustrated is used to find feasible parameter sets for one value of y between 0 and 
Each feasible parameter set corresponds to a group of inequalities that can take place 
for the same object. If one of the parameter sets is empty, the corresponding group of 
inequalities can be discarded. Otherwise, we could perform an incremental 
computation, adding each time (or subtracting) a small Ay, related to the shape of the 
silhouettes, to the previous y or, in the case of polygonal silhouettes, taking as y+Ay 
the height of the next horizontal strip. For each group of inequalities, the new feasible 
parameter set at y+Ay must be a subset of the set at y. The cases are arranged in a tree, 
whose depth is the number of silhouettes. Instead of considering all the leaves at the 
lower level of the tree, that is all the intersections with all the silhouettes, we start the 
computation at higher level. If an inequality group has an empty feasible parameter 
set, the child cases can be discarded. Also, the initial feasible parameter set for each 
child is derived from the one evaluated for the father and it is not taken as the whole 
initial box. 

In order to assess the validity of the approach described, we have experimented the 
algorithms in a virtual environment. An orthogonal camera rotating on a plane around 
the object has been used to create silhouettes of synthetic objects. The paving 
technique introduced has been used to find feasible parameter sets satisfying the 
inequality sets. For each point of the solution, a compatible object can be 
reconstructed using VI. We have experimented the approach with different minimal 
paving resolution and different number of silhouettes. Given the three silhouettes S„, 
Sj and Sj of Fig. 10 as input, several different compatible objects, each one 
reconstructed from one of the eight different feasible sets, can be seen in Fig. 13. 
Other examples can be seen in Fig. 12 and Fig. 13. Finally, in Fig. 14, the expanded 
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tree of the sub cases generated by four silhouettes of the chamfer box of Fig. 13 can be 
seen. The dark nodes correspond to the open nodes. 




Fig. 10. The silhouettes SO, SI and S2 




Fig. 11. Objects compatible with the silhouettes of Fig. 10 




Fig. 12. A boat 




Fig. 13. A chamfer box 
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Fig. 14. Solution tree 
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6 Conclusions 

We have introduced and explored the problem of understanding the shape of 3D 
objects from silhouettes when the relative position of the viewpoints is not known, 
which happens in several practical cases. We have presented a necessary and 
sufficient condition for a set of orthographic silhouettes to be compatible. This 
condition has been applied to the particular case of orthographic projections with 
viewing directions parallel to a plane. For this case, we have been able to work out 
sets of inequalities, involving the volume intersection parameters, which allow 
computing feasible solution sets. An algorithm for automatically writing the 
inequalities has been developed, and a technique involving the paving of the 
parameter space has been introduce to evaluate, if they exist, feasible parameter sets 
satisfying the inequalities. 

Several problems are open. Among them, the case of orthographic projection with 
unrestricted viewing directions, and the case of perspective projections. We will also 
study and discuss thoroughly the case of generic silhouettes, which are not simply 
connected objects or having their boundaries defined by mono-valued functions. 
Another question is worth considering. Except for special cases, we expect that 
infinite compatible objects exist, specified by a region in the space of the intersection 
parameters. Simple ways for describing the shape of the compatible objects seem 
desirable. 
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Abstract. We have developed an algorithm capable of enforcing a shape 
correspondence between two views of the same object in different shape- 
states. This algorithm, together with several other significant updates, 
has helped improve the performance of the Integrated Shape and Pose 
Model (ISPM) described in [1] by a factor of 10. The ISPM utilizes two 
flexible basis views to integrate the linear combination of views technique 
with a coupled- view Flexible Shape Model (FSM) [2]. As a proof-of- 
principle we have evaluated the performance of the improved ISPM in 
comparison to that of its predecessor [1] and of the conventional FSM [3], 
via two different databases. The results show that, unlike the FSM, the 
current ISPM is view-invariant and that, on average, it out-performs the 
FSM. It also out-performs the initial ISPM described in [1]. 



1 Introduction and Background 

Machine vision systems that utilize two or more two-dimensional (2D) images 
to represent three-dimensional (3D) objects have recently become quite popular 
because they are sufficient for many purposes, while computationally being rela- 
tively easy to build. In particular, not building an explicit 3D model means that 
we can avoid poorly conditioned 3D reconstruction steps and can, therefore for 
example, generate virtual images with less noise [4]. There is also some evidence 
to suggest that such view-based representations are used by the human visual 
system [5] . Ullman and Basri [6] developed the view-based approach, also known 
as the Linear Combination of Views (LCV) technique, though only for represent- 
ing rigid objects. In the LCV technique, any image of a 3D object is represented 
as a linear combination of at least “1^” other images of the same object. Ullman 
and Basri [6] used line-drawings, whilst others have taken this concept further 
to the combination of real images [7,8] but using an over-complete approach so 
that the basis views may be teated symmetrically. We reformulated the over- 
complete LCV approach [1], via the Centred Affine Trifocal Tensor (CATT) [4], 
introducing the required constraints [9] . 

Thus far, however, the LCV technique has only been used to represent rigid 
3D objects. We have taken this approach even further to model non-rigid 3D 
objects. For this we integrate an LCV model with a Coupled-View Flexible 
Shape Model (CVFSM) [2], via two flexible basis views, to form the Integrated 
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Shape and Pose Model (ISPM), which was first introduced in [1]. In order to 
generate such a model we need: (z) a technique for mapping the intrinsic shape 
from any given image, simultaneously to two (or more) preselected views and, 
(ii) a technique for ensuring the two mapped shapes correspond to the same 
3D shape (as though they were images captured simultaneously from different 
views). Given two such techniques, we could train a CVFSM on almost any given 
set of images, by first transferring the intrinsic shape from each given image to 
two preselected views providing the required corresponding pairs of images [2]. 
Once we have two such flexible basis views, we can synthesize, via the CVFSM, 
an image of the object in any view by, for example, the LCV technique. 

The first of the above mentioned techniques, (i), which we refer to as a 
2D Pose Alignment, was first described in [1]. However, since we were missing 
technique (iz) for ensuring the two mapped shapes correspond to the same 3D 
shape, the success of the ISPM described in [1], though encouraging, was limited. 
We have now developed the second technique, (zz), which is explained in Sect. 3 
of this paper. It has, along with several other new steps, described in Sect. 2.2, 
helped improve the performance of the ISPM by an order of magnitude. 



2 An Implicit (2D) Pose Alignment via the CATT 

Here we assume, for the moment, that we are given two images of the mean 
shape as seen from two preselected views (the basis views) i.e. we have two sets 
of corresponding landmark points, JV & , respectively from two images of 

the same shape (the mean shape) as seen from the two basis views. Let’s also use 
two sets of corresponding landmark points X' & X" to represent the images, as 
seen from the two basis views, of the shape in a given image, i say, represented 
by the landmark points Xi (i.e. X' & X" have the same shape). The aim 
of the 2D (implicit) Pose Alignment (2D-PA) process is then to recover X' & 
X'l given Xi,x'k x” . 



2.1 The Subset of Stable Points 

Before we begin the 2D-PA process, we first select a subset of at least 4 non-co- 
planar landmark points that can be considered as forming a rigid sub-object. For 
this we employ a RANSAC algorithm [10] to select a subset of p (> 4) landmark 
points that best conforms to the constraints of multi-view geometry for a rigid 
object by minimizing 

e^=^e^(z), where for each Xi e'^{i) = \\T{i)Y{i)\f , (1) 

i 

T{i) is the CATT matrix [1] and Y{i) = (Xf , x”^, X^^)^. Here, in each case, 
each image is represented in the Xi, X & X , only by the subset of p landmark 
points being considered. In our experiments we used a subset of 6 stable points 
(i.e. p = Q) and manually checked that the selected points were not co-planar. If 
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the selected subset were co-planar, then we continued to check the subset that 
provides the next smallest value for until we found a non-co-planar one. 



2.2 The 2D Pose Alignment (2D-PA) Process 

Given the stable points, we begin the 2D-PA process, as in [1], by computing 
the best approximation to the GATT [4] corresponding to Xi, x' & x” by 
computing the T(i) that minimizes e^(i). However, we now use only the subset 
of stable points for this, which provides a more accurate estimate of the GATT 
than reported in [1] and makes the process a lot faster. We then use the computed 
GATT and all the landmark points (not just the stable points) to generate the 
least squares estimate of Aj, the mean shape in the view of the given image Xi. 
Next, we compute the in-view shape difference, AXi = Xi — Xi, which is then 
added to X & A , to generate our first estimates of A' & A": 

A'(temp) = a' + AAi & A"(temp) = a" + AA^ . (2) 

Since AXi is a shape difference in the view Xi, applying it to the basis views 
will not, in general, lead to a valid result, since A'(temp) & A"(temp) will not, 
in general, conform to the constraints of multi-view geometry. However, in each 
case it provides a better estimate of the landmarks of the pose aligned image 
than the means A & A . We continue by extracting, from the GATT, the two 
fundamental matrices that link Xi to each of the basis views via Algorithm 14-.1, 
on page 366 of [11]. We then use the fundamental matrices to compute the equa- 
tions of the epipolar lines in each of the A'(temp) & A"(temp) corresponding 
to each landmark point in the given image Xi. Next, we move each landmark 
point in each of A'(temp) & A"(temp) to the nearest point on the correspond- 

■/ ff 

ing epipolar line to generate A^ & A, , which are updated estimates of A' & 

-f -// 

A". This step ensures that Aj & Aj conform to the multi-view geometry. We 

— I t — n If 

then align Aj to A & Aj to A , as in [1], via a further affine transformation 
applied to the landmark points of each image. This is done in order to determine 
all the degrees of freedom in the alignment process. However, now we do not stop 
at this point, but complete the alignment by enforcing a shape correspondence 

— I — // 

between the two sets of pose-aligned points Aj & Aj , as described in Sect. 3. 

3 Enforcing a Shape Correspondence between Views 

Suppose now that we are using the 2D-PA algorithm to align points in an image 
Xi to the points, A & A , in two given mean basis views. The 2D-PA algorithm 

f ff 

would generate two sets of points Aj & Aj as explained in Sect. 2.2 above, which 

f ff 

are aligned as well as possible. Since Aj & Aj may not have the same shape, in 
order to complete the alignment, we need to update the shapes represented by 
each of them until they can be considered as simultaneous images of the same 
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3D object (i.e. we need to enforce shape correspondence between & ^i)- 

/ // — / — rr 

Thus, our aim, here, is to recover X' & X", given X , X , Xj & Xj . 

— / — // 

We begin by setting Xj & Xj as our first estimates of X' & X", respectively. 
Next, we use Algorithm 13.1, on page 340 of [11], to compute the maximum 
likelihood estimate of the (affine) fundamental matrix that maps X' to X . 
Here again we use only the stable points in order to compute an estimate of the 
fundamental matrix, since a rigid-object is assumed in the algorithm used. We 
then use the fundamental matrix and all the landmark points (not just the stable 
points) to map the shape of X' to X and generate Xj . This shape transfer 
is achieved, as explained in the 2D-PA process, by moving each landmark point 
to the nearest point along the corresponding epipolar line (see Sect. 2.2). Xj 
corresponds to an image of the shape represented by points X' as seen from the 
view of X” . We do the same for the pair of images X" & x' to generate Xj 
and update our estimates of X' & X" as follows: 

X'^i(X' + x') & X"^i(X" + x"). (3) 

We then iterate, using our current estimates of X' & X", to re-compute 
Xj & Xj and using X, & Xj to update our estimates of X' & X" via (3). 
We continue iterating until the difference between consecutive estimates of X' 
& X" is smaller than some tolerance. 

3.1 The Initial Reference Images 

At this point we recall that in order to begin the 2D-PA algorithm, we require 
landmark points, X & X , in the two mean basis views. Thus, initially, we 
generate all the distinct combinations of landmark points from image pairs in 
the training set. We consider each pair of images, enforce a shape correspondence 
between them (as explained next), and compute the error defined in (1) 
corresponding to the selected pair. We then select the two images that produce 
the minimum value for e^. The two images (with the shape correspondence 
enforced) thus selected, are then used as the initial reference images Xj.j^ & 
Xj -2 in the Extended Procrustes Alignment (EPA) algorithm [1] to compute 
the points in the mean basis views. The EPA algorithm begins by considering 
Xj.^ & ^r2 estimates of X & X . Then we iterate, aligning points 

in all the training images (via the 2D-PA process) to the current estimates of 
X & X and re-computing X & X from the aligned sets of images, until 
convergence. Thus, until X & X are computed, we use their current estimates 
instead. 

During the process of selecting the two initial reference images for the EPA 

algorithm (see Sect. 3.1), however, we only have two sets of points that corre- 
'/ ■// / // 

spond to Xj & Xj , since X & X have not yet been computed. Therefore, 
in order to enforce a shape correspondence between these two sets of points, in 
each iteration we use the current estimates of X' & X" in place of X & X . 
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4 The Integrated Shape and Pose Model (ISPM) 

The ISPM, first introduced in [1], utilizes landmark points representing two 
flexible basis views to integrate the LCV technique with a CVFSM [2] . To build 
an ISPM, from (almost) any given set of images, we first select two reference 
images that define the basis views (see Sect. 3.1) and use the EPA algorithm [1] 
to simultaneously compute the points representing the two mean basis views and 
align all the training images to them (via the 2D-PA algorithm). This results, for 
each training image, in two corresponding sets of landmark points that represent 
simultaneous images of the object of interest taken from the two selected basis 
views while the object changes only its shape. Thus, we can then build two FSMs 
to model the intrinsic shape variation present in these two sets of landmark points 
and use the correspondence to build a hierarchical CVFSM. The parameters of 
the CVFSM (the shape parameters) enable us simultaneously to change the 
shape of the object in the two basis view images in a corresponding manner. 
Given the points representing the two basis views of the object with a particular 
shape, we may use the reformulated LCV technique [1] to synthesize that shape 
as seen from any desired view point via an appropriate CATT [4] . The elements 
of the CATT are the pose parameters. To use the ISPM, given the landmark 
points in a new image of the object, we first align them, via the 2D-PA algorithm, 
to the points representing the mean basis views. This provides: (i) the CATT 
that defines the pose of the object in the image (i.e. the pose parameters) and, 
(ii) the input set of landmark points to each of the two individual FSMs. The 
parameters of the individual FSMs then provide the input to the CVFSM, from 
which we extract the shape parameters in the usual way [2,3]. 

5 Evaluation 

As a proof-of-principle the updated version of the ISPM detailed in this paper 
was evaluated in comparison to its predecessor [1] and a conventional Flexible 
Shape Model (FSM) as built by Cootes et al. [3]. The evaluation was carried out 
on landmark points selected from both real and synthetic image data. For the 
real images, we used landmark points manually selected from the same data as 
in [1] , with five expressions (Neutral, Angry, Happy, Sad & Surprised) sampled at 
13 different poses (^ 5° intervals from ^ —30° to ^ +30° where 0° corresponds 
to the frontal view) giving 65 sets of points in total. For the synthetic data, we 
utilized the 3D head model of Loizides et al. [12] to generate images of a face. A 
subset of the 3D model points was manually selected as landmarks. The error- 
free locations of these landmark points in the corresponding synthetic images, 
and the images themselves were computed via an affine projection matrix. Four 
expressions (Fear, Happiness, Sadness & Neutral) were sampled at 11 different 
poses (at 5° intervals from —25° to +25°) to generate 44 sets of points in total. In 
both cases (real and synthetic), rotations were performed only about the vertical 
axis. Owing to the space limit here, we refer the reader to [13] for a complete 
description of the databases. Some example images and the landmark points 
used in each case are shown in Fig. 1. 
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Fig. 1. Some examples of the real (bottom) and synthetic (top) images used in the 
evaluation. The subset of stable points (squares) and the other landmark points (circles) 
used in each case are shown in the leftmost images. 



We evaluated the performance of each model (i.e. of the FSM, the initial 
ISPM [1] and the current ISPM described in this paper) by its ability to recon- 
struct the point configuration in a given image. For this we used each model to 
extract its own representation of a given image and use this representation to 
reconstruct the points representing the original image. The reconstruction er- 
ror was then computed to be the root mean square error between the positions 
of the landmark points in the original image and the points reconstructed by 
the model. We represent this error as a percentage of the scale of the original 
image in order to make it scale invariant. The reconstruction errors were also 
averaged over expression, providing an error measure as a function of pose and 
independent of expression. 

We performed cross-validation [14] and leave-one-out experiments on the two 
data sets in order to determine the accuracy of each model. The results of the 
leave-one-out experiments are shown in Fig. 2. The cross-validation experiments 
produced similar results. The graphs in Fig. 2 clearly show that the FSM is 
dependent on pose whereas the current ISPM isn’t. Furthermore, in all exper- 
iments, on average the current ISPM out-performed the FSM. The minimum 
error of the initial ISPM was, however, much larger (always > 1.0%) and is 
therefore not shown in the graphs in Fig. 2. Thus, although the initial ISPM was 
pose-invariant, it wasn’t able to rival the FSM in terms of accuracy as the cur- 
rent ISPM does. Since similar results were generated on the real and synthetic 
databases, which were completely different in size, shape, number of landmark 
points and noise level, we are confident they reflect the performance of the mod- 
els of interest and not some peculiarity of a particular database. 

We also evaluated the performance of our algorithm that enforces a shape 
correspondence between two views (see Sect. 3) by computing the Pearson Cor- 
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Fig. 2. The reconstruction errors from the leave-one-ont experiments on the (a) real 
& (b) synthetic data for the FSM (dotted line) and the current ISPM (solid line). 



relation Coefficient (PCC) between the eigenvectors, eigenvalues and the scatter 
matrices of the two individual FSMs that were built from the pose-aligned im- 
ages. We use the absolute value of the PCC which is, by definition, between 0 
(no apparent correlation) and 1 (highly correlated) . In all experiments the PCC 
values for the current ISPM were above 0.9, which shows that a shape correspon- 
dence was successfully enforced. Except for the PCC values for the eigenvalues 
(0.9) and of the first eigenvector (0.8), all the other PCC values for the initial 
ISPM were below 0.7. Since space is limited here, we refer the reader to [13] for 
more details on our results. 

6 Conclusions and Future Work 

We have developed an algorithm capable of enforcing a shape correspondence 
between two views of the same 3D non-rigid object in different shape-states. We 
have used this algorithm, along with many other significant updates, to improve 
the version of the Integrated Shape and Pose Model (ISPM) described in [1] 
by a factor of 10. As a proof-of-principle we have evaluated the performance 
of the improved ISPM in comparison to that of its predecessor [1] and of the 
conventional FSM [3] , on two different databases via cross-validation and leave- 
one-out experiments. The results show that, unlike the FSM, the current ISPM is 
view-invariant since we separate the extrinsic (pose) variations from the intrinsic 
(shape) variations. Furthermore, on average the current ISPM described in this 
paper out-performs the FSM, while also completely out-classing the initial ISPM. 
The algorithm that enforces a shape correspondence between two views was also 
evaluated and shown to be successful. We anticipate that the ISPM will be useful 
in a variety of applications including calculation of head pose and view-invariant 
expression recognition. The approach may also be of relevance to theories of 
human vision. 
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Abstract. Colour is an important cue in marry applications in machine 
vision and image processing. Nevertheless, colour greatly depeirds upon 
illumination changes. Colour constancy goal is to keep colour images 
stable. This paper’s contribution to colour constancy lies in estimating 
both the set and the likelihood of feasible colour mappings. Then, the 
most likely mapping is selected and the image is rendered as it would 
be seen under a canonical illuminant. This approach is helpful in tasks 
where light can be neither controlled nor easily measured since it only 
makes use of image data, avoiding a common drawback in other colour 
constancy algorithms. Finally, we check its performance using several 
sets of images of objects under quite different illuminants and the results 
are compared to those obtained if the true illuminant colour were known. 

Keywords: Colour, colour mappings, colour change, colour constancy, 
colour histograms. 



1 Introduction 

In a number of applications from machine vision tasks such as object recognition, 
image indexing and retrieval, to digital photography or new multimedia applica- 
tions, it is important that the recorded colours remain constant under changes 
in the scene illumination. Hence, a preliminary step when using colour must be 
to remove the distracting effect of the illumination change. This problem is usu- 
ally referred to in the literature as colour constancy, i.e., the stability of surface 
colour appearance under varying illumination conditions. Part of the difficulty 
is that this problem is entangled with other confounding phenomena such as 
the shape of objects, viewing and illumination geometry, besides the changes in 
the illuminant spectral power distribution and the reflectance properties of the 
imaged objects. 

A general approach to colour constancy is to recover a descriptor for each 
different surface in a scene as it would be seen by a camera under a canonical 
illuminant. This is similar to pose the problem as that of recovering an estimate 

* Partially funded by a grant of the Gov. of Catalonia and the CICyT DPI2001-2223. 
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of the colour of the scene illumination from an image taken under an unknown 
illumination, since it is relatively straightforward to map image colours back to 
illuminant independent descriptors [1]. 

Therefore, finding a mapping between eolours or the colour of the scene illu- 
minant are equivalent problems. This path has been traced by a great amount 
of algorithms, being those related to the gamut-mapping the most successful [2, 
3,4,5], 

Lately, the trend has slightly changed to make a guess on the illumination, 
as in colour-by-correlation [1] or colour-voting [6], rather than attempting to 
recover only one single estimate of the illuminant. A measure of the likelihood 
that each of a set of feasible illuminants was the scene illuminant is set out 
instead, which is afterwards used to select the corresponding mapping to render 
the image back into the canonical illuminant. 



2 Discussion 

These approaches have two common drawbacks. First, as a rule, all of them rely 
on the fact that the set of all possible colours seen under a canonical illuminant 
is, somehow, known and available. That is, we must know a priori how any 
possible surface will appear in an image. 

The collection of gamut-mapping algorithms uses them to constrain the set 
of feasible mappings, while the colour-by-correlation algorithm builds the cor- 
relation matrix up with them, which in addition implies that this set of colours 
must be known for each single illumination taken into account. 

Secondly, in gamut-mapping algorithms the set of realizable illuminants also 
needs to be known a priori to restrict the feasible transformations. Besides, while 
this set is a convex hull in the gamut-mapping family, it is a finite set in the 
colour-by-correlation algorithm not covering any intermediate illuminant. 

In short, before any of the previous colour constancy algorithms can even 
be set to work, a pretty big chunk of a priori knowledge about reflectances and 
lights is needed, which reduces the scope of those methods. We point out this 
lack in two basic tasks where a mechanism of colour constancy is required [7], 
namely, colour indexing and colour-based object recognition. In both cases, it may 
be very difficult or simply impossible to have an a priori realistic database of 
surface and illuminant colours. Image indexing may be using images of unknown 
origin such as Internet while recognition may be part of a higher task where light 
conditions are uncontrollable or unknown. 

Thus, this paper suggests a less information-dependent colour constancy al- 
gorithm which just relies on pixels and is capable of rendering images from an 
unknown illumination back into a task-dependent canonic illuminant. We only 
require that the set of images to transform shows similar scenes without caring 
about the number of imaged objects since no segmentation is carried out. 
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3 Diagonal Model and Chromaticity Coordinates 



First of all, the problem of modelling the colour change must be considered. If 
referred to the literature, from Forsyth [2] to Finlayson et al. [3,1], the algorithms 
with best performance are based on a diagonal model, i.e., colours recorded 
under one illuminant can be mapped onto those under a different illuminant by 
applying individual scaling factors to each coordinate. Forsyth’s gamut-mapping 
algorithm used 3D diagonal matrices to transform RGB sensor responses: 






a 0 0 
0/3 0 
0 0 7 



{R,G,B)* 



( 1 ) 



That algorithm worked well only on a restricted set of images which included 
flat, matte, uniformly illuminated scenes. To alleviate problems found in images 
with specularities or shape information and to reduce the computational burden, 
Finlayson [3] discarded intensity information by just working in a 2D chromatic- 
ity space usually referred to as perspective colour coordinates: 



Therefore, the diagonal matrix of Eq. (1) expressed in perspective coordinates 
changes into the following relation: 



{r',g'f = 



7 0 

0 ^ 
7 



{r,gf = 



a 0 
0 /3 



{r, gf 



( 3 ) 



Later, Finlayson and Hordley proved in [8] that there is no further advan- 
tage in using 3D algorithms because the set of feasible mappings after being 
projected into 2D is the same as the set computed by 2D algorithms. Hence, 
both chromaticity coordinates in Eq. (2) and 2D diagonal mappings in Eq. (3) 
will be used throughout this paper. 



4 Measuring the Performance of Colour Constancy 

The performance of a colour constancy algorithm is usually measured as the 
error of the illuminant estimates or the RMS error between the transformed and 
canonic images, which is useless if the point of view changes or objects move. 

As reported in [9,7], colour histograms are an alternative way to globally rep- 
resent and compare images. Thus, the Swain& Ballard intersection-measurement 
in [9] computes the resemblance^ between two histograms R and D: 

n(H,T) = ^ min {iJfc,Tfe}G [0,1] (4) 

k 

A distance measure can be similarly defined as Dist {1-L, T) = 1 — Pi {l-L, T) € [0, 1]. 



1 
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The advantages of this measure are that it is very fast to compute if compared 
to other matching functions [10], and more importantly, if histograms are sparse 
and colours equally probable, this is a robust way of comparing images [9,10]. 

Since both colour indexing and colour-based object recognition using the 
Swain& Ballard measure fail miserably when scene light differs from that used 
in creating the database of model images [7], we suggest Eq. (4) as a mean of 
both computing the performance of a colour constancy algorithm and also that 
of measuring the suitability of a particular colour mapping if one histogram 
corresponds to a transformed image and the other to the canonic one. 



5 Colour Constancy Algorithm 

We suggest an algorithm to estimate the set and likelihood of feasible colour 
mappings from image pixels. This set is analogous to the set of possible mappings 
in [2,3,8], but here the likelihood of each mapping is computed, as in [1]. 

The algorithm supposes we have images of similar scenes under different 
illuminants and that we want to render them as seen under a canonic illuminant^. 
The number of objects in the scene does not matter since we do not segment the 
image and only the pixels are used to find a colour mapping as those of Eq. (3). 

More precisely, let /“ and be two colour images of nearly the same scene 
taken under different and unknown illuminants. We take as the canonic and 
our goal is to find a colour transformation T G T which maps the colour of 
the pixels of image /“ as close as possible onto those of image P. T is the set 
of feasible colour mappings. We note the transformed image as Til), which is 
formed by applying T to every pixel in I. 

The main idea of this algorithm is to estimate the likelihood C{T \ 7“,/^) 
of every feasible mapping T G T just from pixel data of images /“ and P . 
Afterwards, we will select the most likely transformation To: 

find To = argmax {£(T | T“,T^)} (5) 

re T 

According to Eq. (3), T = diag{a,$), where d,/3 e [^,255]. Therefore, 
for every pair of chromaticities (r“,g“) e /“ and (r^,g^) e P it is possible to 
compute the transformation relating them as the quotient: 

= ( 6 ) 

Extending these quotients to all the pixels in T“ and /*’, the set of all the 
feasible transformations can be computed as T = {{r\ / rf , g\ / gf) \ (rf,gj) G 
T“ and (r[', g'^) G /^}, where j, i = 1, . . . ,N correspond to the and i*^ pixels 
of images /“ and /^, respectively. N is the total number of pixels of an image. T 

^ What is canonic is a convenience, so any illuminant could be the canonic one. 
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could be further constrained if any extra knowledge about surface or illuminant 
colour were available. 

Whether this is the case or not, the key idea is that the more proper a mapping 
is, the more occurrences must exist in T. Once the set T and its histogram "H(T) 
are obtained, the probability of a certain T € T, Pr{T \ /“, /^), can be estimated 
as the relative frequency of the bin corresponding to T from the histogram "H(T). 
This way, a likelihood function depending of T could be defined as: 

£(T I /'’) = log{Pr{T I /^)), r e T (7) 

where, according to Eq. (5) and (7), the most likely mapping To would correspond 
to the bin of highest relative frequency in 'H(T), fulfilling the idea that the most 
appropriate mapping must have the most occurrences. 

Unfortunately, the previous approach needs a large number of computations 
-0{N^)~ to build the set T and resources to store it. To alleviate those compu- 
tations, a far better approach is the use of image histograms rather than pixels. 

It is possible to construct the histogram of mappings "H(T) and to esti- 
mate the probability Pr{T \ T“,/^) by means of the chromaticity histograms 
T7“ = 'H(P) and of images /“ and /^, respectively. The relative 

frequency of each bin in 'H(T) is the summation of the frequencies of each pair 
of chromaticities giving rise to a certain mapping P by means of Eq. (6) : 

Pr{r\P,l”)=Y. Pr((^,4) TgT (8) 

where TnT = {(r“,5“) G /“ and {r^,g'^) G | T = (^, f^)}. The probability 
of every element of T D T is: 

I I n ■ Pr{{r\g^) I /^) (9) 

where Pr{{r°' ,g°^) \ /“) and Pr{{r^,g^) \ P) are the relative frequencies of chro- 
maticities (r“,g“) G TT“ and {r^,g^) G , respectively. 

This procedure greatly reduces the number of computations to less than 
where M is the number of bins in a histogram, since only non-zero 
bins are taken into account and M « N, considering that M ^ O(IO^) and 
N ^ O(IO^). We average the set of mappings falling into a bin to get a better 
estimate of the mapping corresponding to that bin. The number of histogram 
bins affects the precision of the mapping estimate only if it is too low. 

In practice, some spurious peaks may appear due to the accumulation of 
noisy or to redundant mappings which might mislead the algorithm. Hence, 
the intersection-measure of Eq. (4) is used to evaluate the performance of each 
particular mapping since it globally measures the colour resemblance between 
two images. The better a mapping is, the higher the histogram intersection is. 

Therefore, to improve the chances of obtaining a more precise estimate, we 
newly define the likelihood function combining both Eq. (4) and Eq. (8) as: 

C{T I /*') = log{C^{r{H<^),H'^) ■ Pr{T \ /“,/*')), T G T 



(10) 
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where T{H) = T{TL{I)) is the transformation of a histogram TL{I) by T, which 
is not as straightforward as mapping an image / since the discrete nature of 
histograms and the absence of one-to-one correspondence among histogram 
bins generally produce gaps and bin overlays. 

To avoid gaps, the procedure begins from the bins in T{H) and computes 
their corresponding bin in H using the inverse Bin overlays mean that 

some bins may have been repeatedly counted. Hence, T{H) must be normalised. 

Furthermore, the previous likelihood function is only computed on a limited 
set of mappings to reduce the computational burden. Only those of higher prob- 
ability Pr{T I /“,/^) are checked by Eq. (4) to be a good mapping. Finally, the 
most likely transformation To is selected, as stated in Eq. (5). 



6 Results 

In this section, we perform the previous algorithm in a set of 220 images coming 
from 20 different colourful objects taken under 11 different illuminants^. We show 
the set of objects in Fig. 1. We have chosen this image database to benchmark 
the algorithm since it presents a wide range of both real objects and lights. 




Fig. 1. Set of objects. 



The experiment consists, for each object, in taking in turn each illuminant 
as the canonic while computing colour mappings from the rest of illuminants 
onto the canonic. We measure the performance of each computed mapping by 

® These sets belong to the public database of the Computational Vision Lab at the 
Simon Fraser University located at URL: http://www.cs.sfu.ca/~colour/. 
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Fig. 2. Boxplots of the results per object set and method. 
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Fig. 3. Mean, median and standard deviation of the results per method. 



comparing the chromaticity histogram of the transformed image with that of the 
canonic by means of the distance between histograms defined using the Eq. (4). 

To compare the results with a ground truth, we directly calculate the colour 
transformation out of the real illuminant colour. That is, if two illuminants 
and have colours (i?“, G“, B“) and B^), respectively, their the change 

from onto is (ao,f3o) = {r'^ / r°- , / g°-) , where (r°‘,g°‘) and are the 

illuminant chromaticities, according to Eq. (2). This information was measured 
at the same time as the image database using a diffuse white surface at the scene 
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[4,5]. The mappings computed in this way are the limit of performance of any 
colour constancy algorithm using Eq. (3) to model the colour change. 

In Fig. 2, for each object, we plot the histogram distances into three sets. 
In blue -(I)-, when no colour correction is carried out. These of using the real 
illuminant colour in red -(2)- and those of our algorithm in green -(3)-. Each 
set is depicted as a hoxplot, where the three quartiles form the box with a notch 
at the median, and the maximum and the minimum are the extrema of the bars. 
It can be appreciated in all the sets that there has been a reduction in the colour 
difference with regard to not doing any colour correction -blue sets-. And more 
importantly, the performance of the algorithm is close to that of the mappings 
computed from the real illuminant colour -red sets-. 



Table 1. Global results per method. 



Method 


Blue (1) 


Red (2) 


Green (3) 


Mean 


0.398 


0.166 


0.186 


Median 


0.346 


0.097 


0.118 


St. Dev. 


0.055 


0.040 


0.046 



To globally describe the performance, we put together the former results and 
compute the mean, the median and the standard deviation for each category, as 
can be seen in Table 1 and Fig. 3. Thus, we can state that globally the colour 
difference has decreased from 0.394 to 0.186, a percentage reduction of 56.6%. 
Secondly, these values are close to those obtained when using the true illuminant 
colour, i.e., a distance of 0.166 and a percentage reduction of 60.6%. 



7 Conclusions 

The present paper shows a procedure based on image raw data that, in a frame- 
work where the colour change is modelled as a 2D diagonal matrix, finds a colour 
mapping so that the image colours can be rendered as seen under a canonic illu- 
mination reducing their dependence on the light conditions. The performance of 
the algorithm was checked with a wide range of real images of objects under dif- 
ferent illuminants. The results show its performance is comparable to the case of 
knowing the real illuminant colour. Finally, we can state our algorithm improves 
colour images since stabilises pixel colours in front of illuminant changes. 
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Abstract. Surface reconstruction from parallel cross sections is an important 
problem in medical imaging and other object-modeling applications. Shape and 
topological differences between object contours in adjacent sections cause se- 
vere difficulties in the reconstruction process. A way to approach this problem 
is using the skeleton to create intermediate sections that represent the place 
where the ramifications occur. Several authors have proposed previously the 
use of some type of skeleton to face the problem, but in an intuitive way and 
without giving a basis that guarantees a complete and correct use. In this paper, 
the foundations of the use of the skeleton to reconstruct a surface from cross 
sections are expounded. Some results of an algorithm that is based on these 
foundations and has been recently proposed by the authors are shown that illus- 
trate the excellent performance of the method in especially difficult cases not 
solved previously. 



1 Introduction 

The problem of reconstructing the surface of a solid object from a series of parallel 
planar cross sections (referred hereinafter simply as sections) has captured the atten- 
tion in the Computer Graphics and Vision literature during the past three decades (see 
[1,6,9,10,11,13]). This important problem is found, for instance, in the proeessing of 
medieal images that represent cross sections of the human body interior and are ob- 
tained through non-invasive methods like Computerized Tomography (CT) and Mag- 
netic Resonance. Other applications are the non-destructive digitization of three- 
dimensional (3D) objects from their slices and the reconstruction of 3D models of the 
terrain from topographie elevation contours. 

In general, the data consist of a series of sections that are separated to a constant 
distance. Each one of them is fanned by a set of closed eontours that define the 
boundary of the material of interest to be modeled. The problem resides in finding a 
set of planar elosed convex polygons (usually triangles) that connect the vertices of 
the contours, so that the surface of a geometrieally complete object (see definition in 
[3]) is built. As the sections are consecutive, the problem can be redueed to that of 
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finding the set of polygons that join the contour vertiees corresponding to eaeh pair of 
adjacent seetions. 

Until the beginnings of the eighties, the methods proposed to solve the problem 
forced the eonnection of eaeh vertex of a section with some vertices of the adjacent 
sections. However, as a certain distance separates the sections, the infonnation of the 
plaees where the ramifieations occur in the surface of interest is often missing. This 
causes differences in shape (Fig. la) and in the number of eontours (Fig. Id) between 
adjacent sections. In these cases, the restrietion aforementioned does not only make 
impossible the treatment of the ramifications (Fig. Id), but rather it causes a not very 
real tiling, even producing unavoidable interceptions between the triangles that are 
formed (Fig. lb). 





Fig. 1. Adjacent sections with shape differences, a), b) y c) are modified from [1] 

A way to approach this problem is creating intermediate seetions that represent the 
plaee where the ramifications occur {dotted line in Figs.lc, e). To this end, several au- 
thors [1,6,8,9,10] have proposed previously the use of a skeleton, but in an intuitive 
way and without giving a basis that guarantees a complete and eorrect use. 

The previous related works that approaeh the problem using some type of skeleton 
are commented in the next section. The relationships between the eoneepts of image, 
skeleton and section are studied in Sections 3 and 4, and they eonstitute the founda- 
tions of the use of the skeleton to reconstruet a surface from cross sections. Finally, 
some results of an algorithm reeently proposed by the authors which is based on these 
foundations [11] are shown that illustrate the excellent performance of the method. 



2 Overview of Previous Related Work 

Sloan and Hrechanyk [13] were the first ones to suggest the creation of artificial in- 
termediate sections between adjacent sections in the cases where these sections are 
very different. Then the tiling between this artifieial seetion and eaeh one of the two 
that originated it could be made using any of the proposed methods. In this way, the 
model would better fit the reality, representing the place where the ramification oc- 
curs in the intermediate section. 

However, it is not until the work of Levin [8] where the first method is proposed 
that builds a set of intermediate contours between contours of adjacent original sec- 
tions in order to solve the ramification and tiling problems. The Levin’s method is 
based on calculating the distance field for each point of each section. This value is the 
signed distance between the analyzed point and its nearest contour. In terms of dis- 
tance fields, contours can be regarded as isocurves with an isovalue of zero. The value 
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is positive or negative depending on whether the point is inside or outside the eontour, 
respectively. The distance fields of the intermediate section are obtained by adding, 
for each point, the values of the corresponding distance fields in the original contigu- 
ous sections. The main limitation of this method is the very large number of triangles 
that the obtained surface presents. 

The polygonal form of skeleton called medial axis is used for the first time in 
Meyers’ doctoral thesis [9]. However, it is not inserted between adjacent sections, but 
used to obtain information about the relationships of vicinity among the regions 
where ramifications occur. The fonn of skeleton used was called shaved medial axis 
(SMA) (Fig. Id) and the possible types of connections among its loops helped to clas- 
sify the ramifications. The method does not work correctly in the cases of ramifica- 
tions from many-to-many contours. In addition, the projections on a same plane of the 
contours related with the ramification can intercept each other. 

Bajaj et al. [1] detected the parts of contours with very different shape and applied 
a method similar to the one used by Geiger [4] to tile them. This method requires the 
edge Voronoi diagram (EVD), but due to the difficulty in implementing a numerically 
stable algorithm, Bajaj et al. proposed to find a rough medial axis using an iterative 
decomposition of the polygon, in which cutting edges are added until all polygons are 
convex. The authors did not specify how to implement this decomposition. 

Oliva et al. [10] used a new type of skeleton called angular bisector network 
(ABN) that was calculated as an approximation to the EVD. Each segment of the ana- 
lyzed contours was associated with a cell of the ABN. The cells that guarantee a cer- 
tain level of proximity can be triangulated in a straightforward way. Otherwise, an in- 
termediate contour is inserted that consists of the common border between cells 
corresponding to contour segments in different sections. This procedure can be re- 
peated recursively until all the cells are triangulated. 

In a recent work, Klein et al. [6] presented an algorithm that combines the ap- 
proach proposed by Levin and the recursive triangulation proposed by Oliva et al. In- 
stead of the complex calculation of the ABN, Klein et al. computed discrete distance 
fields to define intermediate contours and the needed correspondences. The main ad- 
vantage of the method proposed is the use of the z-buffer of standard graphics hard- 
ware to obtain the medial axis that separates the projections of the analyzed contours 
and the proximity correspondences between the vertices of the medial axis and those 
of the analyzed contours. All the examples shown by Klein et al. were artificial and 
none of them included holes. 



3 Relationship between Image and Skeleton 



3.1 Definitions 

An image can be defined as a function /: N x N — »■ G where /(x, y) is the illumination 
of a pixel with spatial coordinates (x, y) belonging to the set N of natural numbers and 
G is the set of positive integer numbers representing their illumination. In this work, 
binary images will be used, where the two values represent the background and the 
object {white and black pixels in Fig. 2a, respectively). 
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Fig. 2. Binary image (a), skeleton over it drawn in white (b) and pixel connectivity (c-e) 

The skeleton or medial axis ean be defined, in a general way, as a set of connected 
lines or curves that are equidistant with respect to the borders or limits of a figure 
[12]. If the figure is represented by a binary image, the skeleton is its narrowest repre- 
sentation (Fig. 2b). A specifie definition is: the skeleton E(l) of an image / is a set of 
points p located inside the boundary of 1 such that there exists, for each one of them, 
at least two points belonging to the boundary that are separated at a minimum dis- 
tance from it (and, therefore, p is halfway). 

The process for which the skeleton of an image is obtained is denominated skele- 
tonization. Most of the skeletonization algorithms erode the borders of the binary im- 
age repeatedly until narrow lines or simple pixels remain. This erosion process is also 
known as thinning. Taking into aceount the comparative analysis of twenty thinning 
algorithms carried out in [7], we selected, among the algorithms that preserve connec- 
tivity, the Suzuki- Abe algorithm [14] to be used in this work, due to its high speed, 
simplicity and demonstrated suceess, even in recent works [5]. 



3.2 Connectivity of Skeleton Vertices 

By definition of skeleton, the connectivity of each one of its points is determined by 
the number of pixels that belong to the boundary of / and are at a minimum distanee 
from Pi. This means that the connectivity of the E(l) pixels is determined by the shape 
of the original image boundary. When the shape is similar along a certain trajectory 
what is obtained is just a path for this trajectory. For example, the skeleton of an im- 
age that represents a hand-written letter is an approximation of the way that the pencil 
tip goes through when drawing it. 

Thus, eaeh pixel of E{T) can be elassified according to its connectivity. A terminal 
pixel has connectivity one and is caused by the presenee of a local maximum in the 1 
boundary. A terminal pixel appears at the end of a segment where a protuberanee or 
loeal convex shape oecurs in / (Fig. 2e). An intermediate pixel has eonnectivity two 
and is obtained when the 1 boundary presents a similar shape in both sides of the line 
that goes approximately through the pixel and its two adjacent pixels (Fig. 2d). Lastly, 
a branch pixel has connectivity greater than two and is caused by a ramification in- 
volving two or more trajectories. A branch pixel appears in the place where the shape 
of 1 ramifies (Fig. 2c). Terminal and branch pixels will be called extreme pixels. 

Analyzing the existent eonnections among the different skeleton pixels, a skeleton 
can be considered as a set of extreme pixels that are eonnected to each other through 
zero or more intermediate pixels. A group of intermediate pixels that connect a pair of 
extreme pixels will be called a rail of the skeleton. Note that a rail is equidistant from 
two portions of the image boundary that present a similar shape along it. 
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4 Relationship between Skeleton and Section 

The problem of surface reconstruction from sections consists of obtaining a surface 
that connects the vertices of the contours that belong to each pair of adjacent sections. 
A supposition that covers most of the real cases is that the projection of this surface, 
on an intermediate plane parallel to the original sections, should be in the region that 
separates the material of interest of each pair of adjacent sections. The strange cases 
not covered by this supposition have not been dealt with by any of the consulted au- 
thors. Nevertheless, some results on these strange cases are included in Section 5. 



4.1 Construction of the Skeleton from Adjacent Sections 



When projecting the regions occupied by the material of interest of the adjacent sec- 
tions on a parallel plane (Fig. 3b), the region not common to both regions is the one 
that separates the material of interest of these sections (Fig. 3c). 




Fig. 3. Region that separates the material of interest in adjacent sections 

This action can be expressed using logical operations on binary images. Eaeh sec- 
tion is represented as a binary image where the region that occupies the material of in- 
terest has been determined. The image / that separates the material of interest of two 
adjacent sections is the result of the binary operation XOR (v exclusive) on the corre- 
sponding two images. In this way, the only pixels in / that are drawn are those that be- 
long to the region of the material of interest in only one of the analyzed sections. Af- 
ter this operation, it is necessary to include the pixels that form the boundary of each 
one of the contours involved in order to be able to include in the skeleton the pixels 
where the contours of the adjacent sections intercept (if any). To obtain the skeleton it 
would remain to apply some skeletonization technique (commented in 3.1) to the im- 
age /, whose result would be similar to the one shown in Fig. 2b. 



4.2 Significance of the Skeleton 

As has been discussed in 3.2 the conneetivity of the E(I) pixels is determined by the 
shape of the boundary that the image 1 represents. However, as has been defined in 

4.1, the boundary of / is formed by the contours of the contiguous sections. Therefore, 
the shape of the analyzed contours determines the connectivity of the pixels of the re- 
sulting skeleton E(I). The analysis that follows is very similar to the one discussed in 

3.2, but taking into account that the shape of the image 1 is conditioned locally by the 
separation of the near contours in the adjacent seetions. 



Reconstruction of Surfaces from Cross Sections Using Skeleton Information 185 






Fig. 4. Formation and fusion of ribbons 

As has been explained in 3.2, a rail separates equidistantly two portions of the im- 
age boundary that present a similar shape along it. By construction of E{I), these por- 
tions of the / boundary correspond to the projections of portions of the contours that 
belong to the analyzed adjacent sections. It is deduced then that a rail of E(l), built ac- 
cording to 4.1, is halfway the projections of two nearby contour portions {PCca, PCch) 
with similar shape located in the contours C„ and Ch, respectively. If PCca and PCch 
belong to the same section, then the rail will represent the place where the necessary 
ramification occurs, so that these portions will be connected to each other at an inter- 
mediate height of the analyzed sections. 

In this way, we arrive to the basic structure of the reconstruction called ribbon, 
which is composed by a rail L and a contour portion PC, such that they are close to 
each other and keep some shape similarity. The proximity relationship implies that 
there is no other rail or contour portion between L and PC. We can take advantage of 
these ribbon properties to reconstruct the surface that the ribbon fonns using some 
simple and quick algorithm [2]. Notice that, due to the skeleton construction, each rail 
has two associated ribbons, one for each one of its sides. 

During the reconstruction process, the endpoints of each contour portion should be 
included so that the union of the contour portions associated with each rail of the 
skeleton yields the original contours (Fig. 4b). In this way, the reconstruction of the 
surface between two adjacent sections is reduced to the union of the reconstraction of 
all the ribbons that form it. 

The large number of generated triangles is one of the drawbacks of some of the 
consulted methods that use the skeleton in the surface reconstruction process. In order 
to reduce the number of triangles, the ribbons that share a common rail and connect 
contour portions located in different sections can be fused together (Fig. 4c). 



5 Results and Discussion 

We have recently proposed a surface reconstruction method which implements all the 
ideas presented in the two previous sections [1 1]. Fig. 5 displays some results on syn- 
thetic and real examples that are discussed below. See [11] for more details. 

The example shown along this work (Fig. 5a) contains two adjacent sections that 
not only have a different number of contours, but also a very marked difference in 
their shapes. The result on this example is shown in Figs. 5a and 5b. Another example 
refers to the existence of holes in some of the sections, and the corresponding result is 
shown in Figs. 5c and 5d. Finally, Figs. 5e-5i display the results on an example that 
has not been solved by any method of the consulted literature. It is a surface portion 
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that twists abruptly among the sections, causing its projection to be found inside the 
region belonging to the material of interest in both sections. 




Fig. 5. Tiling result {a, c, h) and 3D view (b, d, i,j, k) of some reconstruction examples 



In all synthetic cases it can be observed that the reconstructed surface is closed and 
does not intercept itself The place where the ramifications occur is inserted in an in- 
termediate height of the original sections and shown in the figures as dotted lines. 

The results of the proposed algorithm are also shown in the reconstruction of a 
human jaw from real TC images (Figs. 5j, 5k), where the correct reconstruction of 
ramifications is observed in the base of the teeth. 



6 Conclusions 

In this paper, the foundations of the use of the skeleton to reconstruct a surface from 
cross sections have been explained and illustrated. 

After a review of the previously reported works that have used some type of skele- 
ton to solve the surfaee reconstruction problem, it was concluded that all of them 
made an intuitive use of the skeleton and there was a lack of a basis that guaranteed a 
complete and correet use of the skeleton information. 

We have argued that there exists a elose relationship between the contours of two 
adjacent sections and the rails of the skeleton built from the region that separates the 
material of interest in both sections. By skeleton construction, each rail separates 
equidistantly the projections of two nearby contour portions with similar shape. This 
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property can be used to reconstruct in an easy way the region, denominated ribbon, 
which is between the rail and one of the related contour portions. If both contour por- 
tions belong to the same section, then the rail can be used to represent the place where 
the ramification occurs in an intermediate height to the original sections. Otherwise, 
there is no ramification and, to reduce the number of triangles in the resulting surface, 
the rail may be discarded by fusing the adjacent ribbons that share it. The final surface 
reconstruction reduces to the union of the reconstruction of all the formed ribbons. 

Some results of a surface reconstruction algorithm recently proposed by the au- 
thors have been shown. This algorithm [11] is based on the ideas presented here to 
give solution to the investigation problem. The examples displayed here have in- 
cluded difficult cases, even one not solved by the consulted literature. In all cases, a 
topologically correct surface is obtained. Moreover, all the cases are treated in a uni- 
fied way, independently of whether the number of contours in the adjacent sections is 
the same or not, or whether the shapes of the involved contours are similar or not. 
Hence, a great deal of generality is achieved in the proposed solution. 
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Abstract. The principal steps of a new method to solve the problem of surface 
reconstruction from parallel cross sections are presented in this paper. This 
method constitutes the extension of one previously proposed by the authors us- 
ing the skeleton to solve the investigation problem. The method guarantees the 
correct topology of the surface without altering the original contours. Some re- 
sults are shown that illustrate the excellent performance of the method in par- 
ticular difficult cases not solved previously. All the cases analyzed are manipu- 
lated in the same way. In real cases, the global time complexity improves the 
quadratic time of the quickest consulted methods. 



1 Introduction 

The problem of reconstructing the surface of a solid object from a series of parallel 
planar cross sections has been treated by the specialized literature in the last three 
decades [3,5,8,9,14]. A cross-section is fonned by a set of closed contours defining 
the boundary of the material of interest to be reconstructed. As a distance separates 
the sections, information is often lost of the places where the ramifications occur in 
the surface of interest. This causes a shape difference and a different number of con- 
tours in adjacent sections (Fig. 1). A way to approach this problem is creating inter- 
mediate sections representing the place where the ramifications occur [8-10], 

In this work, two verification criteria are taken into account. These criteria have 
been used by many authors (e.g. [1,3,9,10,14]): 1) The proposed solution should ob- 
tain a topologically correct surface (in general, closed and not intercepted with itself) 
and 2) A resample of the same surface, in the plaee oceupied by the original sections, 
should produce the original data. 

The authors of the present work previously proposed a new method [10] to solve 
the branching problem. The method is based on the skeletonization technique to cre- 
ate new contours, corresponding to an artificial intermediate slice that models the 
level where branching occurs. This method makes a successful treatment of several 
ramifications cases without violating the verification criteria. However, it neither 
deals with the cases of local protuberances not present in the adjaeent section (Fig. la) 
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nor the cases of multiple ramifieation where more than a contour of a section should 
connect with more than a contour of the adjacent section (Fig. lb). 

Only few of the consulted works ([1,8]) solve the “many to many” ramification 
case (Fig. lb) and none has reported the solution to the case in which surface portions 
twist (Fig. Ic). 




Fig. 1. Top view of several difficult cases 

In this work a new method constituting an extension of [10] is proposed to offer an 
efficient and automatic solution to the investigation problem. The method reeonstructs 
a topologieally correct surface without modifying the data of the original sections. 

Below, in Section 2, the main steps of the method are described. Their complexity 
is analyzed in Section 3. Finally, in Section 4, some results are shown in different 
complicated examples. 



2 Proposed Method 

For each original section, the initial data are a set of closed contours that define the 
boundary of the material of interest to be reconstructed. The proposed method con- 
sists of applying five steps to each pair of adjacent sections. The pseudocode of the 
main subroutine would be: 

SUBROUTINE Reconstruction of model 
FOR EACH section of model 
Detect correspondences between S. and 
FOR EACH corresponding contours set 
Construct skeleton image 
Obtain skeleton graph 
Form ribbons 
Tile ribbons 
END FOR EACH 
END FOR EACH 
END SUBROUTINE 

The first step determines the correspondences existing among the contours of the 
sections analyzed. In this work, an overlapping method was used. This method estab- 
lishes that two eontours should be connected by a surface if the projections of the ma- 
terial of interest they wrap up overlap to a certain threshold. All the projections are 
made on a plane parallel to the original sections (usually the XY). The following steps 
are explained below and, by way of example, their results are shown in Fig. 2. 
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2.1 Skeleton Image Construction 

The second step uses the same projection information than the previous step to build 
an image / representing the area that separates the material of interest of the analyzed 
contours (Fig. 2b). Then a thinning algorithm is applied, similar to the one used in [6], 
to obtain the skeleton E(I) (shown as thick lines in Fig. 2c). Optionally, the short hair 
can be eliminated (Fig. 2d). For more details concerning this step, [10] should be con- 
sulted. The skeleton built in this way offers very valuable information to reconstruct 
in a correct and quick way the surface that connects the corresponding contours [12]. 




Fig. 2. Steps of the proposed method 





2.2 Skeleton Graph Obtaining 

Each black pixel of the skeleton image E(I), built in the previous section, is included 
in the skeleton graph G. Structurally, G is formed by a list of nodes or extreme verti- 
ces VE {thick dots in Fig. 2d) and a list of arcs or rails L {lines in Fig. 2d). Each node 
VE contains its coordinates (x, y) and an ordered circular list of its connections N. 
Each connection N contains a rail L and the pixel of L to which VE is connected, 
called neighboring vertex VV. For convenience, the order of the connections follows 
the distribution of the neighboring vertices W counterclockwise around VE. Each rail 
of G contains two extreme vertices and a list of the intermediate pixels that form the 
rail. 



2.3 Formation and Fusion of Ribbons 

In this step, the close relationship that exists between the image and its skeleton is 
used to simplify the final tiling, dividing the area to be reconstructed into parts called 
ribbons (Fig. 2e). Each ribbon is composed by a rail Z of G and a portion PC of one 
of the contours analyzed. L and PC are near and bear similar shape so that there is no 
other rail or contour portion inside the ribbon conformed. The foundation of this de- 
composition is discussed in [12]. 

As a result of the previous steps, the rails L and extreme vertices VE of G are avail- 
able. To form the ribbons we need to determine the portions PC of the original con- 
tours. During the reconstruction process, the endpoints of each contour portion should 
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be linked so that the union of the contour portions associated with each rail of the 
skeleton produces the original contours (Fig. 2e). Hence, in the union, these extreme 
vertices of portions are the only repeated points. In this way, the surface reconstruc- 
tion between two adjacent sections is reduced to the union of all the ribbon recon- 
struction. The pseudo-code of an algorithm that guarantees the correct and quick se- 
lection of the contour portions to form the ribbons is presented below. 

SUBROUTINE Form ribbons 

(a) Initialize contours and extreme vertices connections as 
unmarked 

FOR EACH branch vertex VE^ 

FOR EACH connection N. of VE. 

IF N. IS unmarked 
1 

(b) Find unmarked contour C and the nearest vertex 
that would be connected to VE, between N. and N. 

1 ] j+i 

(c) Choose connection that follows the direction of 
contour C FROM V, .,“n., N. , VE. 

(d) Form contour ribbons FROM C, V_, VE., 

(e) Mark the contour C 
END IE 

END EOR EACH 

END FOR EACH 

END SUBROUTINE 

The first step (a) is responsible for labeling all contours and rails (connections be- 
tween extreme vertices) as unmarked. In step (b), the vertex F,„, of an unmarked origi- 
nal contour C that is met at the minimal distance from VE, but on the right side of the 
lines {VE, W/+/), (VVj, VE), is found (Fig. 3a). In step (c), the connection whose VV is 
located on the same side as F ,„,+7 regarding the line {VE, Vi„i) is selected (Fig. 3b). 
This ensures that, when forming the ribbons related to the contour C, the path fol- 
lowed in the rails has the same direction as C. In step (d), the call to the subroutine 
that forms the ribbons related to C is made. Its pseudo-code is given next. 




Fig. 3. Principal steps of the formation of ribbons 
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SUBROUTINE Form contour ribbons REQUIRE contour C, initial 
vertex V.^^, extreme vertex VE.^., initial connection 

SET V EQUAL TO V.^. 

SET VE EQUAL TO VE.^. 

SET N EQUAL TO N.^. 

REPEAT 

(f) Determine next extreme vertex VE^. and next connection 

N . FROM VE , N 

(g) Find the nearest vertex in contour C that would be 

connected to VE^^^ FROM V 

(h) Insert ribbon V, VE, VE^^^ 

(i) Mark the connections N and N^^ 

(j) Determine initial connection N of the next ribbon 

EROM V . , VE . , N . 

sig ' sig ' sig 

SET VE EQUAL TO VE^.^ 

SET V EQUAL TO V^.^ 

UNTIL (V EQUAL AS V._,.) AND (VE EQUAL AS VE._,.) 

END SUBROUTINE 

After having executed the steps (f)-(i), that are self-explained, it is necessary to de- 
termine the initial connection N of the next ribbon. The three possible situations for 
step (j) to be executed are shown in the Fig. 3d-e-f. If VE^ig is terminal, then 7V= 
(Fig. 3f). If VE,ig is branch and V^ig is on the right side of the straight lines {VEstg, VV/), 
(W/.j, VEsig), then N= Nj.j (Fig. 3d). Otherwise, the rail and the contour get crossed 
an odd number of times and then A = A,+/ (Fig. 3e). 

Optionally, to simplify the result, the adjacent ribbons whose borders belong to 
contours of different sections may be fused. In this way, only the skeleton vertices 
that are involved in ramifications remain (Fig. 21). 



2.4 Tiling of Ribbons 

As described in [12], a ribbon is composed by a rail L and a contour portion PC 
that keep to each other proximity and shape similarity. This property can be exploited 
to tile the surface it forms using some simple and quick algorithm [5]. In addition, the 
first verification criterion mentioned in Section 1 can always be satisfied. 

Finally, in the fifth step of the main procedure, the tiling of each ribbon is per- 
formed and the final surface is obtained as their union (Fig. 2g-h). The height of the 
skeleton vertices is intermediate to the analyzed sections, which guarantees the sec- 
ond verification criterion. 



3 Complexity Analysis 

The overall complexity of the proposed method is 0(n ■ m), where n is the number of 
vertices and m is the number of contours in the analyzed adjacent sections. 

For the calculation of the complexity of the first step, it is taken for granted that the 
number of pixels to process is proportional to n. Both the construction of the skeleton 
image and the extraction of its skeleton graph can be performed in linear time 0{n). 
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The graphics of Fig. 4 show the results of the execution times of these steps in a real 
example composed by 151 sections, 449 contours and 91941 vertices. Their tenden- 
cies {thick lines) confirm the indicated linear time complexities. 




Fig. 4. Results of tests for construction of skeleton image {left) and graph {right) from different 
contours (Running in PC with Intel Pentium® processor at 736 MHz and RAM of 128 Mbyte) 



The complexity of the fourth step is dominated by the initial search of the contour 
and nearest vertex. This step is run as many times as contours there exist in the ana- 
lyzed sections. As in each call to this step, the vertices belonging to the already 
marked contours are not treated, its complexity is 0{n ■ m). This is obtained from: 
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A very quick algorithm is used for the ribbon reconstruction (fifth step), which pre- 
sents a linear complexity 0{n) [5]. 



4 Results and Discussion 

Next, some results of the application of the proposed method are shown in different 
synthetic examples. For more details [1 1] should be consulted. 

In a case of two contours with very different shape, similar to Fig. la, the results of 
surface reconstruction using three classical methods (greedy, optimization and con- 
tour composition; similar to [5,7,13], respectively) are shown in Fig. 5a-c, whereas 
the result of the proposed method is displayed in Fig. 5d. In the Figs. 5e-i, different 
ramification examples (without holes in the sections) are shown where a contour 
ramifies in two (5e-g) and three (5h-i) contours; Figs. 5g and 5i contain the results of 
our method. Another example refers to the existence of holes in some of the sections 
(Fig. 5k-l). A barely treated case in the consulted literature occurs when several con- 
tours of a section should be connected to several contours of the contiguous section; 
the results of our method on two such examples (the latter also with a hole) can be 
appreciated in the Figs. 5j, 5m. A case not approached by the consulted literature is 
shown in Fig. 6 in which a part of the surface twists abruptly between the sections. 



194 J. Pina Amargos and R. Alquezar Mancho 




Fig. 5. 3D view of some reconstruction examples, a), b) are taken from [1]; c), f) from [13]; e) 
is taken from [4] and h), k) from [2], d), g), i), j), 1) and m) are results of the proposed method 





Fig. 6. Contours (a), detail of tiling (b) and 3D view (c) of a very difficult example 

All the analyzed cases show the quality of the results not only from the aesthetic 
point of view, but also in the satisfaction of the verification criteria enunciated in Sec- 
tion 1 . The rails related to the ramification are inserted at an intermediate height of the 
original sections {dotted-lines in Figs. 5, 6). 



5 Conclusions 

The main steps of a new method to reconstruct a surface from a set of cross-sections 
have been described. This method constitutes the extension of one previously pro- 
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posed by the authors using the skeleton to solve the investigation problem [10]. 

The proposed method always reconstructs the surface of the whole projected area 
separating the material of interest between eaeh pair of adjacent seetions. It guaran- 
tees the correct topology of the reconstrueted surface, beeause the new vertices, that 
model the places where the contours ramify, are inserted at an intermediate height of 
the adjacent sections without altering the original contours. 

The method is general, simple and quick. It permits to manipulate in a same way 
all the cases reported in the literature and, even, one not tried by other authors. Its 
overall complexity is 0{n ■ m), where n is the number of vertices and m is the number 
of contours in the analyzed adjaeent sections. This improves the complexity 0{rT) of 
the quiekest eonsulted methods. 

Some application results have been shown in different examples that, regardless of 
their high degree of complexity, illustrate the excellent performance of the method. 
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Abstract. This paper describes a Fourier domain algorithm for surface 
height recovery using shape from shading. The algorithm constrains sur- 
face normals to fall on an irradiance cone. The axis of the cone points in 
the light source direction. The opening angle of the cone varies with iter- 
ation number, and is such that the surface normal minimizes brightness 
error and satisfies the integrability constraint. The results show that the 
method recovers needle maps that are both smooth and integrable, with 
improved surface stability. 



1 Introduction 

Shape-from-shading (SFS) is a problem in computer vision which has been an 
active topic of research for some three decades. The process was identified by 
Marr[10] as a key process in the computation of the 2.5D sketch, and was studied 
in depth by Horn [5]. The topic has also been the focus of recent research in the 
psychophysics literature [9] [2] [3] . Stated more formally, the SFS problem can be 
regarded as that of calculating the set of partial derivatives [Z^^Zy] correspond- 
ing to a surface Z = Z(x,y), where Z is simply an intensity image. ^ In brief, 
we need to solve the image irradiance equation, E{x,y) = R{p{x,y)^q{x,y), s), 
where E is the intensity value of the pixel with position {x,y), R is a function 
referred to as the reflectance map [6] , that maps the surface gradients p = 
and q = to an intensity value and s is the light source direction. If the 

surface normal at the location {x,y) is n = (p, g, — 1) then under Lambertian 
reflectance model, the image irradiance equation becomes E{x,y) = n - s. 

Unfortunately, the image irradiance equation is underconstrained, and the 
family of surface normals fall on a reflectance cone whose apex angle a is equal 
to cos~^E{x,y), and whose axis points in the light source direction s. Several 
constraints have been used to overcome the underconstrained nature of the Lam- 
bertian shape-from-shading problem. However, their main drawback is that they 
have a tendency to oversmooth the recovered surface slopes and result in poor 

* Supported by National Council of Science and Technology (CONACYT), Mexico, 
under grant No. 141485. 

^ More than one image can be used, but this is an extension of SFS referred to as 
photometric stereo. 
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data-closeness. The net result is a loss of fine surface detail. For a complete 
survey of most SFS methods, see [16]. 

In a recent paper Worthington and Hancock [14] have demonstrated how 
these problems may be overcome by constraining the surface normals to lie on 
the reflectance cone and allowing them to rotate about the light source direction 
subject to curvature consistency constraints. Unfortunately, the needle maps de- 
livered by the method are not guaranteed to satisfy the integrability constraint, 
which means that the recovered partial derivatives are not independent on the 
path of integration (i.e. the height function may not be recoverable). Besides, 
these needle maps also suffer the drawback of high dependency on the image in- 
tensities, making the method prone to noisy data such as specularities, roughness 
and overshadowed areas. 

There are a number of ways in which a surface may be recovered from a field 
of surface normals [7,8]. One approach is to use trigonometry to increment the 
height function along a path or a front [1,12]. However, one of the most elegant 
approaches is that described by Frankot and Chellappa [4] which shows how the 
surface may be reconstructed subject to integrability constraints by performing 
a Fourier analysis of the field of surface normals. 

The aim in this paper is to develop a shape-from-shading scheme that can be 
used to recover integrable needle maps subject to hard constraints on Lambertian 
reflectance as well as relaxing the image intensity dependance driven by such 
constraints. In order to demonstrate how the two techniques can be combined, in 
subsequent sections we will briefly explain the geometric approach developed by 
Worthington and Hancock[14] for solving SFS, as well as the algorithm proposed 
by Frankot and Chellapa [4] for enforcing integrability in SFS. 

2 Geometric Approach for SFS 

Worthington and Hancock [14] have developed an SFS method in which the 
image irradiance equation is treated as a hard constraint by constraining the 
recovered surface normals to lie on the reflectance cone. Suppose that Nk is a 
smoothed^ surface normals at step k of the algorithm, then the update equation 
for the surface normal directions is 



where 6* is a rotation matrix computed from the apex angle a and the angle 
between the current smooothed estimate of the surface normal direction Nk and 
the light source direction. To restore the surface normal to the irradiance cone, 
it must be rotated by an angle 



^ For further details about the suggested method for smoothing the normal field, see 



Nk+i = ONk 



( 1 ) 




[ 13 ] 



198 



M. Castelan and E.R. Hancock 



about the axis (u,v,w)'^ = x s. Hence, the rotation matrix is 

( c + u^c —ws + uvc vs + uwc \ 

ws + uvc c + v'^c —us + vwc (3) 

—vs + uwc us + vwc c + w^c j 

where c = cos(0),c = 1 — c and s = sin{9). 

The needle maps delivered by this geometric framework have proved to be 
useful in experiments for topography-based object recognition [15]. 

3 Integrability in SFS 

The integrability condition in SFS ensures that the recovered surface satisfies 
the following condition on the partial derivatives of the height function: Z^y = 
Zyx- This condition can also be regarded as a smoothness constraint, since the 
partial derivatives of the surface need to be continuous in order that they can be 
integrable or independent on the path of integration. In [4] Frankot and Chellapa 
proposed a method to project a gradient field to the nearest integrable solution. 
They suggested to use a set of integrable basis functions to represent the surface 
slopes so as to minimize the distance between an ideally integrable gradient field 
and a non integrable one. 

Following [4], if the surface Z is given by 

y)=Yl ( 4 ) 

where w is a two dimensional index belonging to a domain 17, and y, uj) is a 
set of basis functions which are not necessarily mutually orthogonal, the partial 
derivatives of Z can also be expressed in terms of this set of basis functions using 
the formulae 

Zx{x,y) = '^ C{w)(j)^{x,y,w) and Zy{x,y) = '^ C{u;)(j)y{x,y,u;) (5) 

UJ G ^ cj ^ iT2 

Given that (pxix, y, u>) and <py{x^ y, iS) are integrable, then so are the mixed partial 
derivatives of Z{x,y). 

In the same way, the possibly non integrable gradient field (which, indeed, is 
the only information we have) can be represented as 



Zx{x,y)='^Ci{uj)(j)x{x,y,Lo) and Zy{x,y) = '^ C2{w)(j)y{x,y,u}) (6) 



uj^ 



cj ^ iT2 



Note that, as Ci yf C 2 , then Zxy yf Zyx- 

The goal then is to find the set of coefficients that minimize the quantity 



d[{Zx,Zy),{Zx,Zy)]= I I 



Zx — Zx 



Zy Zy 



dxdy (7) 
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As Frankot and Chellapa proved, the set of coefficients C{uj) minimizing the 
distance given by the above equation is 



C{io) 



Px{ui)Ci{uj) + Py{ui)C2{uj) 

Px{^) + Py{^) 



( 8 ) 



where Px{uj) and Py{to) are / J \\(px{x,y,uj)\\^dxdy and / J \\(py{x,y,u!)\\^dxdy 
respectively. 

If 4>{x,y,ui) is assumed to be the set of Fourier basis functions eyijp{jujxX + 
jojyy), with f2 = (27m, 27 tto), where n € {0, 1, • • • , N—1} and m € {0, 1, • • • , M— 
1} for an N X M image, then Px = Py = Ci{oj) = Cx{^)lj^x, and 
(72 (w) = Cy{uj)ljujy. Therefore, (8) is represented in the Fourier domain by 



Cico) 



-jUxCxjuj) - jUJyCy{uj) 
U!l + Ul^ 



(9) 



In this manner, by projecting the set of coefficients C{uj) back to the spatial 
domain, a height map corresponding to the nearest integrable surface Z{x,y) 
can be obtained from the input gradient field. 



4 Introducing the Integrability Condition in the 
Geometric Approach for SFS 

The idea underpinning this paper is to calculate the nearest integrable surface 
and obtain the apex angle of the Lambertian cone on this surface after each 
iteration. The algorithm can be summarized as follows: 

1. Calculate an initial estimate of surface normals N = (TVj,, Ny, N^). 

2. Smooth N to obtain N. 

3. Obtain the nearest integrable surface Z by solving (9) using the smoothed 
surface normal field N. 

4. Get the apex angle a of the Lambertian cone using the values of Z, that is 
to say, a = cos~^{Z). 

5. Calculate iV, by rotating iV, using (1). 

6. Make N = N and return to step 2. Repeat until a desired number of itera- 
tions has been reached. 

We should note that in this method the rotation matrix does not remain 
static through the iterative process, since the changes in a depend on the recov- 
ered surface after each iteration. It is also important to remark that due to the 
projection of the surface normals to the reflectance cone after each iteration, the 

z-component Nz of the normal N will always correspond to the calculated height 
surface of the final gradient field when using the Frankot and Chellapa height 
recovery method. By contrast, in the method of Worthington and Hancock the 
z-component Nz will always be the normalised input intensity image. Therefore, 
besides calculating surface gradients, the new algorithm also calculates height 
information. 
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5 Experiments 

The algorithm was tested on synthetic as well as real images. The evaluation 
criteria was based on the absolute height difference and degree of gradient con- 
sistency (i.e. the percentage of pixels of every image whose differences Zxy — Zyx 
are less than or equal to a certain threshold^). In our experiments we have com- 
pared the results obtained with the geometric approach of Worthington and 
Hancock, and the new integrable-geometric approach. 

Three synthetic images were tested"^. Forty real world images (fifteen of these 
with corresponding height data, taken from the range database in [17], and the 
rest taken from [11] and from [16]) were also used for tests®. 





Fig. 1. Left: plot of the absolute height differences for synthetic and range images. 
Right: plot of the gradient consistency degree tests. 



Figure 1 (left) shows the results for the absolute height differences. The orig- 
inal approach is represented by the dotted line, while the new one is represented 
by the solid line. The plot reveals that there seems to be no bias for favoring 
each method, and that the height difference between them is not significant. 

The results of the experiments for degree of gradient consistency are summa- 
rized in Figure 1 (right). The figure shows that the combined algorithm (solid 
line) gives more consistent results than the original one (dotted lined), as the 
percentage of gradient consistency is greater for the new approach. This suggests 
that the new method is enforcing integrability in the original method. 

In a further analysis of the results. Figure 2 shows a 3D plot of the recovered 
heights for each method. The first column corresponds to the input image, the 
second column is a plot of the range data of each image given as a base for 
comparison, the third column represents the recovered height map for the new 
method and the fourth column shows the height maps for the original method. 

® For all the experiments this threshold was set to 0.1. 

^ These images were also used in [16]. 

® For all the tests, the light source direction was assumed to be [0,0,1]. 
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Fig. 2. Recovered height surface for range images. From left to right: intensity image, 
range data, recovered surface for the combined algorithm, recovered surface for the 
original algorithm. 



We can observe that the new algorithm seems to stabilize the surface, avoiding 
some of the sudden changes present in the recovered surface for the original 
method. Specifically, in the cases of the frog and the pelican, the recovered 
surface appears to be smoother, with none of the spurious peaks in the height 
map which result from the use of the original method. Also, the height plots 
of Budda and Mozart show a more stable surface than those produced by the 
original method. 

Figure 3 shows the recovered needle maps for each method. A visual exami- 
nation of the results suggests that the new method delivers needle maps that are 
both smoother and contain more fine topographic detail than original method. 
This effect is more evident in the cases of the frog and Budda. 
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Fig. 3. Recovered needle maps for each method. Top row: combined algorithm. Bottom 
row:original algorithm. 



6 Conclusions 

In this paper we have demonstrated how to impose integrability constraints on 
the geometric approach for SFS suggested by Worthington and Hancock. We 
follow Frankot and Chellapa and impose the constraints in the Fourier domain. 
Experiments reveal that the resulting method exhibits improved robustness and 
gradient consistency. However, although the height difference statistics do not 
reveal any systematic improvement in algorithm performance, both the recov- 
ered height surfaces and the needle maps delivered by the new algorithm appear 
to be better behaved and also preserve fine surface detail. It is important to 
comment that in this new method the calculation of surface orientations is less 
constrained by the irradiances of the image, as the rotation matrix changes 
through the iterative process. This is a way of relaxing the original method’s 
problem of hard constraints on data-closeness with the image irradiance equa- 
tion. Our future plans include using alternative basis functions and in particular 
the discrete cosine transform, as well as comparing the output needle maps for 
local integration tests. 
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Abstract. This paper describes two new methods for lens distortion 
calibration using image and point correspondences. Images (or fea- 
ture points) captured by a camera are undistorted and projected into 
a calibration pattern image. Both methods apply the Gauss-Newton- 
Levenberg-Marquardt non-linear optimization technique to match, in 
one case, the camera image and the pattern image, and in the other 
case, selected point correspondences from the camera image to the pat- 
tern image. One way to automatically find good point correspondences is 
presented. Experimental results compare the performance of both meth- 
ods and show better results using point to point correspondences. 



1 Introduction 

Most algorithms in 3-D Computer Vision rely on the pinhole camera model be- 
cause of its simplicity, whereas video optics, especially wide-angle lens, generate 
a lot of non-linear distortion. In some applications, for instance in stereo vision 
systems, this distortion can be critical. 

Camera calibration consists of finding the mapping between the 3-D space 
and the camera plane. This mapping can be separated in two different trans- 
formations: first, the relation between the origin of 3-D space and the camera 
coordinate system, which forms the external calibration parameters (3-D rota- 
tion and translation), and second the mapping between 3-D points in space and 
2-D points on the camera plane in the camera coordinate system, which form 
the internal calibration parameters [1]. 

This paper introduces two new methods to find the internal calibration pa- 
rameters of a camera, specifically those parameters related with the radial dis- 
tortion due to wide-angle lens. 

The first method works with two images, one from the camera and one from 
a calibration pattern (without distortion) and it is based on a non-linear op- 
timization method to match both images. The search is guided by analytical 
derivatives with respect to a set of calibration parameters. The image from the 
calibration pattern can be a scanned image, an image taken by a high quality 
digital camera (without lens distortion), or even the binary image of the pattern 
(which printed becomes the pattern). 

The second method works with point correspondences from the camera image 
to the pattern image, and apply a similar procedure to the first method to 



A. Sanfeliu and J. Ruiz-Shulcloper (Eds.): CIARP 2003, LNCS 2905, pp. 204—211, 2003. 
(c) Springer- Verlag Berlin Heidelberg 2003 



Correcting Radial Lens Distortion Using Image and Point Correspondences 205 




(a) Pattern in front 
of the camera 



(b) Image captured 
by the camera 



(c) Both images 



Fig. 1. The distortion process due to lens 



find the best set of parameters. The set of point correspondences are computed 
automatically, taking advantage of results of the first method. 

The rest of this paper is organized as follows. Sections 2 and 3 describe the 
distortion and projective model that we are using. Sections 4 and 5 present 
the methods to match images and to match pairs of points, respectively. Experi- 
mental results are shown in Section 6. A brief comparison of previous calibration 
methods with our methods are in section 7. Finally, some conclusions are given 
in Section 8. 



2 The Distortion Model 

The distortion process is illustrated in Figure 1. Figure 1 (b) shows an image 
taken from the camera when the pattern shown in Figure 1 (a) is in front of the 
camera. Note the effect of lens, the image is distorted, specially in those parts 
far way from the center of the image. Figure 1 (c) shows the radial distortion in 
detail, supposing that the center of distortion is the point Cd with coordinates 
(ca;,Cy). Undistorted pixel at position R with coordinates {x,y) points to pixel 
Ru with coordinates {xu,yu)- 

Let Id be the distorted image captured by the camera and /„ the undistorted 
image associated to Id- The relationship between both images is modeled by: 

Iu{9‘^,x,y) = Id{xu{0‘^,x,y),yu{9‘^,x,y)), 9^ = {ki,k2,Cx,Cy,s^) ( 1 ) 

Xu = Ca; + ^^(1 + -I- /c 2 r^), yu = Cy + {y - Cy){l + -b fc2r^) 

+ {y-c^ 

Where 9‘^ are internal calibration parameters of the camera. Parameters fci and 
k 2 define how strong is the radial distortion with distortion center (Cx,Cy). Pa- 
rameter Sx is the aspect ratio of pixels (sx = 1 means square pixels). 
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3 The Projection Model 

Figure 1 shows and ideal case, where the plane of the pattern is parallel to 
the camera plane and center of the pattern coincides with the optical axis of 
the camera. Using homogeneous coordinates, the class of 2-D planar projective 
transformations between the camera plane and the pattern plane is given by [5] 
[x' ,y' jW'Y = M[cc, y, ui]*, where matrix M has eight independent parameters. 



M = 



mo mi m2 
m3 m4 ms 
iriQ rriY 1 



Plane and homogeneous coordinates are related by {xpi = x/w,ypi = y/w) 
for one plane and (Xp 2 = x'jw',yp 2 = y'lw') for the other plane. Let Ip be 
the projection from the camera plane (with the undistorted image /„), to the 
pattern plane. The new image is given by: 

y) ; Vu') : Vpi,^^ : : VuS) 

= (mo,mi,m2,m3,m4,ms,m6,m7) (2) 

moXu+m^yu+m2 m^x-^+miyu+m^ 



4 The Image Registration Method 

The goal is to find a set of parameters 0*^ and 9^ so the projected image, Jp, 
match the image, Ir, of the calibration pattern put in front of the camera. 

We formulate the goal of internal calibration as to find a set of parameters 9 = 
(mo, mi, m 2 , m 3 , m 4 , ms, me, mr, fci, /c 2 , Cx, Cy , Sx) such the sum, Et, of square 
differences between pixels of Ip and Ir, is a minimum. 



9 = argminEt{Ip{9),Ir) = argmin ^ {Ip{9,x,y) - Ir{x,y)f- (3) 

V{x,y)£lr 



4.1 Non-linear Optimization 

The Gauss-Newton-Levenberg-Marquard method (GNLM) [3] is a non-linear 
iterative technique specifically designated for minimizing functions which has 
the form of sum of square functions, like Et- At each iteration, the increment of 
parameters, S9, is computed solving the following linear matrix equation: 



AS9 = B 

A= [J*J + A/],B = -J*e 



( 4 ) 



If there is p pixels in images and q parameters in 9, A is a matrix of dimension 
qx q. Matrix J, of dimension px q, is the Jacobian of e. I is the identity matrix, 
e is the vector of all differences of pixels between both images and has dimension 
qxl, so B has dimension qxl. A is a parameter which is allowed to vary at each 
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iteration. After a little algebra, the elements of A and B are computed using the 
following formulas, 



p 



= = = Ip{d,Xk,Vk) - Ir{xk,yk) (5) 

k=l ^ ® ^ k=l ^ * 



Applying the chain rule to compute the partial derivatives and considering eq. 
2, we get, 

^^k dlp{6^x = Xk^y — yk') ^ipixp^yp) Oxp dipixp^yp^ ^yp 

dOi d9i dxp dOi dyp dOi 

In order to simplify the notation, we use Xp instead of Xpk and yp instead of ypk- 
and are the partial derivatives of the image Ip in the x and 

y directions. and for {Oq, - ■■ 9^) can be derived from eq. 2, 



dxp _ ^ 

dmo D 

dxp 

dmi D 

dxp 

drri2 D 

dxp _ pj 

dm, - ^ 

OXp p\ 

gnu “ ^ 

dxp _ 
dm, ~ 

OXj) —XuX-n 



= 0 



draQ D 

dxp _ —yuXp 

dmj D 



dVp 
drriQ 

9vp _ n 
dmi — 'y 
dyp _ n 

dm, 

oyp Xu 

dm.3 D 
9Vp _ Vp 
drri4 D 
dyp _ jp 
dm, D 

dyp _ -XpPp 

drriQ D 

dyp _ -ypyp 
dmj D 



(7) 



Where D = m^Xu + mryu + 1- Partial derivatives of distortion parameters are 
derived from eq. 1 and two more applications of the chain rule, 

d9i dxu d9i dyu d9i ’ d9i dx^ d9i d9i 



S^={Dmo-{mQX^+mi*y^+m2)me)/D'^ , ^^=(Dm.i-{moXu+mi*y^+m2)mj) / D'^ 
^|^=(Z>m3 — (m3Xu+m4*yu+m5)m6)/£)^, Si^ = {Dm4 — {m,Xp+m4*y„+m,,)m7)/D‘^ 

Finally, the last set of formulas presented in [6], 

^ = r2(x- C3:)/S3: 

^ = r'^ix- 
III = - Cy) 

= 1 - (l/s3,)(l + + k2r‘^) - 2(fci + 2k2r‘^){x - Ct,YI{sI) 

^ = -2{ki + 2k2r‘^){x - cY){y ~ Cy)/sl) 

= -2(/ci + 2k2r‘^){x - cYj{y ~ Cy)!s^ 

= 1 - (1 + + k 2 xY - 2{y - CyY(ki + 2 k 2 r^) 

= -{x - Cx){l + fcir^ + k2rYlYx ~ 2(^1 + 2 k 2 r'^){x - c^Y / 

^ = -2(y - Cy){ki + 2k2r^)(x - cY^/sl 
where r was defined previously in eq. 1. 
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4.2 The Calibration Process 

The calibration process starts with one image from the camera, Id, another image 
from the calibration pattern, Ir, and initial values for parameters 9. The GNLM 
algorithm is as follows: 

1. Compute the total error, Et{Ip{9),Ir) (eq. 3). 

2. Pick a modest value for A, say A = 0.001. 

3. Compute the image Ip (eq. 1, 2) applying bilinear interpolation to improve 
the quality of the image. 

4. Solve the linear system of equations (4), and calculate Et{Ip{9 + 69), 1^). 

5. if Et{Ip{9 + 69), Ir) > Et{Ip{9) , Ir) , increase A by a factor of 10, and go the 
previous step. If A grows very large, it means that there is no way to improve 
the solution 9. 

6. if Et{Ip{9 + 69), Ir) < Et{Ip{9), Ir), decrease A by a factor of 10, replace 9 
by 0 + 69, and go to the first step. 

When A = 0, the GNLM method is a Gauss-Newton method, and when A 
tends to infinity, 69 turns to so called steepest descent direction and the size 69 
tends to zero. 



5 The Point Correspondences Method 

This method tries to improve the calibration results using the approach described 
in previous section. When the calibration ends, the undistorted and projected 
image. Ip, is very similar to the pattern image, Ir- The idea is to extract points of 
both images associated to distinctive features. In our experiments we use corners 
as features because they are detected easily with subpixel precision. 

The first step is to detect features in C and then search its correspondence 
in Ip. This search is limited to a small area because C and Ip are very similar. 
Let n be the number of features, {xrk,yrk) be the coordinates of a feature in 
Ir and {xk,yk) be its correspondence in Ip. From (xk,yk) and using eq. 1 and 
2 we can get the coordinates (xpk,ypk) of the feature in the camera image (Id)- 
These calculations are denoted as follows, Xpk = f^‘^{9,x = Xk,y = yk), Vpk = 
fP‘^{9,x = Xk,y = yk) and (xpk,ypk) = F’^{9,x = Xk,y = yk)- So we have a set 
of pairs of points P = {< {xri,yri), {xpi,ypi) >,■■■ ,<{x rn 7 2/rn) ; (^pn 7 2/pn) 

We formulate the goal of the calibration as to find a set of parameters 6 such 
the sum, Dt^ of square distances between points f^^iO^Xrk^Vrk) and (xpk->ypk)-, 
is a minimum, 

0 = argmin Dt{6^ P) 

^ ( 11 ) 
= aTgUlin ^ ^ {fx ^rk^ Urk) 3^pA:) ify (^7 ^rk-i Vrk^ 2/pfc) 
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Fig. 2. The calibration process 



5.1 Non-linear Optimization 

We use again the GNLM method to minimize Dt, but this time, the elements of 
matrix A and matrix B in eq. 4 are given by, 



V— vTi / Oxpf^ Oxpf^ I dypk dypk \ t \ / dxpf^ . . dyp^ 

^xk — fx 7 ^rk 7 Urk ) ^pk 7 dyk — fy , Xpk , Urk^ Upk 

6 Experimental Results 

We test two Fire-i400 firewire industrial color camera from Unibrain with 
4.00mm C-mount lens. These cameras acquire SOfps with resolution of 640 x 480 
pixels. 

The pattern calibration (image Ip), showed in Figure 2(a), was made using 
the program xfig under Linux. The image taken by the camera is shown in Figure 
2(b). The corrected and projected image, using the image registration method, is 
shown in Figure 2(c). The GNLM process required 17 iterations and 57 seconds 
(using a PG Pentium IV, l.SGhz). We apply derivatives of Gaussians with a = 1 
pixels, initial values of 9‘^ = (0,0,240,320,1) and 6^ = (1, 0, 0, 0, 1, 0, 0, 0). At 
the end of the calibration process, the total error, Et, between the projected 
image Ip{9) and Ip (Figures 2(a) and (c)), was 14,820. This result is very good. 

Gorners of Figure 2(c) are easily detected applying two derivative filters. We 
apply a derivative of Gaussian (cr = 2 pixels) in one direction and then another 
in the other direction (see Figure 3). Pixels around corners have higher (or lower) 
values in Figure 3 (b). Gorners are calculated, with subpixel precision, as the 
center of mass of pixels around the corners. 

The point correspondences method required 39 iterations and less than 5 
seconds. This time, Et was 14,165, a slightly better result than with the other 
method. The difference is more evident from the sum of square distances, Dt- 
When using the image registration method we got Dt = 518, and Dt = 131 for 
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Fig. 3. Detecting corners 



the point correspondences method, a significant reduction. This difference also 
can be observed calculating the maximum individual distance between points 
= \J^’xi + ^yi)- Using this criteria, the image registration method got = 

1.84 pixels and the point correspondences method = 1.25 pixels. 

Finally, Figure 4 shows an application of the parameters obtained with the 
second method, 0^ = (-7.86 x 10-°^6.43 x IQ-^^ 217.75, 310.68, 1.00). Images 
were expanded from 640x480 pixels to 800 x 600, to see the complete expansion. 

7 Related Works 

There are two kinds of calibration methods. The first kind is the one that uses 
a calibration pattern or grid with features whose world coordinates are known. 
The second family of methods is those that use geometric invariants of the image 
features like parallel lines, spheres, circles, etc. [2]. 

The Methods described in this paper are in the first family of methods. The 
image registration method uses all points or pixels of the image as features, 
instead of the set of point correspondences of the second method. The corre- 
spondence with reference points are given implicitly in the pattern image (for 




Fig. 4. Original and corrected images 
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the first method) or computed automatically (for the second method). Other 
methods require a human operator (with a lot of patience) to find such corre- 
spondences [6]. 

This method is an improved version of the method proposed by Tamaki et 
al. [6] and the differences between both approaches are: 

— We take into account exact derivatives of Xp and yp with respect to 9^ (eq. 
7). Tamaki uses an approximation that is valid only when and Xp (and 
yu and yp) are very similar. This approximation makes the method not very 
robust. Converge problems arise when parameters 9 are not so closed to the 
right ones. Tamaki’s method for the same images shown in Figure 2 gave us 
Et = 15310, a slightly greater value than our method. 

— We optimize the whole set of parameters 9 using the GNLM method. Tamaki 
apply twice the Gauss-Newton method, one for 9‘^ and other for 9^. 

— We use a direct registration (from the camera image towards the pattern 
image), while Tamaki uses inverse registration (from the pattern image to 
the camera image). 

8 Conclusions 

We have described two calibration methods based on the Gauss-Newton-Leven- 
berg-Marquardt non-linear optimization method using analytical derivatives. 
Other approaches compute numerical derivatives (e.q. [1,2,4]), so we have faster 
calculations and better convergence properties. 

The first method is an image registration method, which is an improved ver- 
sion of a previous one [6]. The second method takes advantages of results from 
the first method to solve the correspondence problem between features of the 
camera image and the pattern image. Also takes advantage of detecting features 
(corners) with subpixel precision. This combination gives better calibration re- 
sults than with the image registration method. 
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Abstract. This paper presents a simple and original method that uses 
a conhguration of only two sonars to measure and characterize surfaces. 
The method uses simultaneously the Time Of Flight (TOE) technique 
and basic triangulation, and characterizes the obtained sonar data into 
corners, edges and planes, along with non-classified points. The charac- 
terization is based on a simple trigonometric evaluation. A commutation 
system with two sonars that use a configuration with a transmitter and 
two receivers was built to verify the proposed methodology. Experiments 
and satisfactory results are also presented. 



1 Introduction 

Sonars are ultrasonic devices widely used in autonomous vehicles and robot nav- 
igation [1,2]. These sensors provide a cheap option to measure distances and to 
detect obstacles. The most common strategy used by sonars to obtain measure- 
ments is called Time Of Flight (TOF), which consists of sending an ultrasonic 
pulse and measuring the elapsed time until the echo returns after hitting an 
object. Although TOF measurement in several cases is simple and precise, its 
interpretation is difficult and tends to provide incorrect appreciations. An exam- 
ple of the results obtained by a sequence of readings with a rotating system of 
sonars (rotational scan) is shown in Figure 1, with the real environment super- 
imposed. The modelling of a certain environment with only a set of straight and 
well-defined lines is difficult because some surfaces cannot be clearly detected. 
This fact provoked the abandonment of sonars as a sole medium of navigation [3] . 
In spite of this, some variants of the original TOF technique have recently proved 
to be useful in environment mapping and characterization. These variants are 
characterized by increasing the number of receiving sonars for each sonar emis- 
sion. This approach obtains quite reliable measurements, thus better representa- 
tions of the studied environment [4,5]. These works have been developed testing 
different quantities of receiving sonars (2 to 4) by one transmitting sonar. The 
transducers used as transmitters are activated one by one, but never at the 
same time. These methods are based on information redundancy obtained after 
activating all the transmitting sonars [6]. All of these investigations use more 
than two sonars. In addition, a system capable of distinguishing objects with 
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Fig. 1. Scan of a sonar, the real environment is superimposed for comparison. 



a minimum of two transmitters and two receivers has been described [7]. This 
configuration uses three sonars: a sonar used exclusively as transmitter, another 
used exclusively as receiver, whereas the third one has a transmitter/receiver 
function. In all the previous methods, the characterization is based on complex 
probabilistic estimation. 

This article describes a simple and cheap method for characterizing indoor 
environments. It consists of using two sonars simultaneously under a one trans- 
mitter and two receivers configuration, in such a manner that complementary 
TOF values are obtained. Using triangulation, these values allow us to classify 
the measurements in concave corners, edges, planes and non-classified points. 
The sonar configuration is similar to the used in [8], however, in our research, 
both sonars alternate the transmitting role. Thus, in this investigation the struc- 
tures of interest were mainly polygonal indoor environments with right angles. 

The proposed method is optimal in the sense that, in order to acheive trian- 
gulation, the minimum number of required sonars are used. At the same time, 
the characterization is based on a very simple trigonometric evaluation, whose 
equations are also provided in this paper. 

Furthermore, a very cheap system of sonars was built to verify the method. 
The system only uses two Polaroid© 6500 modules to sense the environment. 

This paper is organized as follows: Section 2 analyses the configuration with 
one transmitter and two receivers. In Section 3 some experiments and their 
results are shown. Finally, Section 4 presents the conclusions of this work. 



2 A Commutated Transmitter with Two Receivers 

The signal emmited by the sonars could behave in two different ways. If the 
dimension of the surfaces that produce echo is larger than the wavelength of 
the sonar, the signal will be reflected. Otherwise, the signal will be diffracted. 
The reflecting surfaces return the signal based on the law of reflection, causing 
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a specular reflection. Planes and corners are in this category. In contrast, the 
diffracting surfaces return the signal in all directions in a similar way to diffuse 
reflection, decreasing the echo signal very fast. In this category the edges are 
included. 

Figure 2(a) describes the configuration used in this research. The commutat- 
ing sonars (Ti and T2) are separated by a distance b. First, sonar Ti transmits 
and both sonars (Ti and T^) receive the signal, obtaining two measurements 
(ri 1 and ri 2). Afterwards, sonar T2 transmits and its signal is received by 
both sonars, obtaining other two measurements (r2 1 and ^2 2)- fi j is the dis- 
tance obtained when sonar Ti transmits and sonar Tj receives. Therefore, the 
proposed plan of a transmitter and two receivers obtains four measurements to 
calculate the distance between the object and the sonars, which helps to classify 
the objects into concave or convex, as corners and edges, respectively. 

Next, measurements in different surfaces are analyzed, in the following order: 
plane, corner and edge. To distinguish between these types of surfaces the rela- 
tionship: ri 2 + T2 1 — {rii + r2 2) is analyzed and corresponds to the sum of the 
crossed distances (different transmitter than receiver) minus the direct distances 
(same transmitter and receiver). Moreover, in each case it will be derived the 
distance a which is the length from the center of the arrangement of sonars to 
the analyzed point, and the angle 4 > which is the angle between a and b’s per- 
pendicular line. These two variables represent the most accurate measurements 
that can be obtained for both distance and angle. This is shown in Figure 2(a). 
A more detailed description of the obtained equations is found in [9] . 

To validate these calculations, we assume that ri 2 = ^2 1, which establishes 
that, in an environment that does not change in time, and in which each sonar is 
found inside the other’s range; the crossed distances (different transmitter than 
receiver) should be the same. 

Finally, note that the sonars reflections are used in the plane and corner 
analysis [7] to make them easier. 

Analyses of the three cases {plane, corner and edge) are presented in the 
next subsections. 



2.1 Plane 

In Figure 2(a) the reflections of the sonars are shown. Distances ri j represent 
the distance obtained from the sonar Ti to sonar Tj reflection (Tj); for example, 
ri 1 goes from Ti to T(. 

Along with these distances. Figure 2(a) shows the following variables: 
b: distance of Ti to T2. b = |b|, where b = T2T1. 

a: distance between the medium point of b and the plane, a = |a|, where a is 
the vector that goes from b’s medium point to the plane. 
ri ji distance from Ti to Tj. ri j = j|, where Vij = TiTy 

a\'. angle from b’s perpendicular to ri 1. 
a2- angle from b’s perpendicular to ri 2. 

( 3 -. angle from ri 1 to ri 2. 
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Fig. 2. Plane, Corner and Edge analysis. In (a) and (c) it is illustrated the reflection 
of both sonars over the wall (T[ and T 2 ). Besides, the four distances r are observed in 
all three cases, (based on [6]). 



4>: angle from b’s perpendicular to a. 

From these values we will deduce a, </>, and the behavior of r\ 2 + r 2 1 — 
{ri 1 + V 2 2 ). 

According to Figure 2(a): 

rii+r 22 = 4a, (1) 

from the triangle conformed by ri 2 and 6cosai it can be demonstrated that 
its base is equal to 2a. Now, to determine the difference between ri 2 + ^2 1 — 
(ri 1 + r 2 2 ), first ri 1 is derived, 

ri 1 = 6sin ai + 2a, (2) 

of the right triangle is obtained: 

ri 2 = r -2 1 = \/4a2 + 6^ cos^oi, (3) 

adding the crossed distances, considering a » 6, and rearranging some terms, 

cos^ Ol 
8a^ 

then, the relationship of the crossed and direct distances is obtained 



ri 2 + ^2 1 = 2\/ 4a^ + 6^ cos^ Oi ~ 4a ( 1 + 



(4) 



ri 2 + r-2 1 - (r-i 1 + T2 2) 






cos a I 



2a 



Finally, </> is obtained easily, since is equal to angle ai, 

"ri 1 - r 2 2' 



oi = arcsm 



26 



(5) 



( 6 ) 



As it is clearly observed in equation (5), the result is always positive and 
varies according to a and a\, since 6 remains constant once it is defined. 
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2.2 Corner 



Figure 2(c) describes the corner analysis. It is worth to mention that the analysis 
is valid only if the corner’s angle is 90°. In Figure 2 a reflection is made, similar 
to the one performed for the plane, adding or modifying the following variables: 



a: distance between the center of b and the corner’s vertex. This definition 
changes in regard to the presented one in the previos section 
7i: angle from b to r2 i 
72: angle from ri 2 to — b 



As the previous case, we will obtain a, 4> and ri 2 + ^2 1 — (ri 1 + T2 2)- a 
is defined as, 

f’l 2 + ?’2 1 = 4a (7) 

ri2 = T2i = 2a ( 8 ) 

of the triangle formed by b, r\ 1 and r2 1 , and making use of the cosines law, 
ri 1 = ^Jrl^ + b‘^ -2 t 2 lb cos -f I (9) 

from the Figure 2(c) 71 = § + 02 is obtained. Using a variable change in (9) and 
the following identities: cos (f + a) = — sina and r2 1 = 2a, 



ri 1 = 

It also can be shown that: 



\jr2i + b‘^ - 2t2 1&COS + 0 : 2 ) ~ + 6sina2- 



r22~2a+- 6 sin 02 

4a 

using equation (11) in ri 2 + r2 1 — (ri 1 + T2 2), 

fr. \ \ 

4a — 2a + - — h 0 sm 0:2 — 2a + 6 sin «2 ~ ~ • 



4a 



4a 



2 a 



( 10 ) 

( 11 ) 

(12) 



In agreement with equation (12), the result for the corners is always negative, 
varying according to a. Finally, in the Figure 2 is observed that ^ = o;2- It is not 
possible to get a general equation for corners whose angle is different from 90° 
using the described analysis, because in those cases the relation |GU| = IGU'I = 
IG'FI = IG'F'I is no longer true and it is not valid to use the sonars reflection. 



2.3 Edge 

The edge analysis will be based on Figure 2(b). In this case, it is not possible 
to use the reflection of the sonars or virtual image, because the distances are 
not conserved when the reflection is done. Due to the lack of virtual image, the 
distances observed in Figure 2(b) only represent half of the measurements ri 1 
and T2 2 ■ 

For the convex corners case, it can be shown that ri 2 + ^2 i~(?'i 1 + ’'2 2) = 0i 
this is because 
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Then a is obtained using the sinus law with the triangle formed by | 



and a, deriving 

Finally, (p is obtained, 






r2 2 b 



■ cos 7i . 



(14) 



(j) = 



7T 

2 



— cos 




(15) 




(a) Experiment environment (b) First view 



Fig. 3. Experiment environment and different views of the designed environment and 
the experimental setup. In (a) each capital letter represents a line in the environment. 



3 System Implementation and Experiments 

To verify the theory described in this article a system with an arrangement of 
two sonars mounted on a rotating mechanism (see Figure 3) was built and 
a graphic interface was developed in JAVA to visualize, store and analyze the 
collected data. The system is composed of two Polaroid 6500 modules to control 
the sonar transducers. 

The experimental environment is similar to that of an indoor robot. Views 
of the environment and its dimensions are illustrated in Figure 3. 

The environment was built having in mind following considerations: First, 
The environment was built using a proper material to reflect adequately the 
sonar signal. Tables and desks of wood were used for this purpose. Besides, the 
surfaces should be flat and without holes. 

The first experimental step consists in scanning the environment to obtain the 
readings. The system delays around two minutes to obtain a scan, and it senses 
100 points in one scan (400 r distances). Then, the information is processed and 
its characterization is obtained. Examples of the results of this process are shown 
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Fig. 4. Scan characterizations from a given position. The points are characterized as 
follows: corners, black circles; planes, white and edges, gray. In (b) distance a and angle 
4 > mentioned in the text are shown and represented by the lines that come out of the 
center of the figure. Non-classified points were assigned an omission value of = 0 and 
don’t have a circle to represent them. Real environment is shown in dashed lines. 



in Figures 4(a) and 4(b). The circles are points that the system recognizes as 
corners, planes or edges. It is observed in the Figures that is difficult to detect the 
edges correctly. The dashed line in these Figures represents the real environment. 

In contrast, it is observed in the Figures that not all the scan readings are 
classified as corners, edges, or planes. These are the non-classified points, due 
to: 

1. The condition that the crossed distances must be equal (ri 2 = r 2 1 ) is not 
satisfied, probably because one sonar signal is not detected by the other 
(receiver) . 

2. One of the sonars or both are unable to detect the surfaces to be measured. 

3. The sum of the crossed distances minus the sum of the direct distances does 
not correspond to any of the three cases afore mentioned. 

As it is shown in Figure 4, the system has a detection level of more than 80% in 
corners and planes (in readings that should be characterized as such entities), 
but only reaches a level of 40% in edges, since they are very difficult to measure 
due to the diffraction of the echo. It is worth to mention that when a reading is 
considered as not classified, the system uses for analysis the smallest of the four 
r obtained in that point. 

Note that Figure 4(a) show groups of readings mainly in the corners and 
planes of the real environment. In Figure 4(b) angle 4> is used, and it is observed 
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how the characterized points converge in the corners and planes of the real 
environment. 

This technique has advantages upon similar vision techniques, i.e. structured 
light or stereoscopic vision, mainly in the price and amount of information that 
is needed to process. 

4 Conclusions 

This investigation proposes a method that utilizes two sonars for the measure- 
ment and characterization of surfaces. The method is based on the combination 
of the TOF technique and triangulation, which applies basic trigonometry cal- 
culations to differentiate among corners, edges and planes. 

The experiments were performed in an environment built up with tables and 
desks in order to validate and evaluate the behavior of the implemented System. 

The System recognized corners and planes correctly, but edges were more 
difficult. 

The system proved to be reliable and efficient . In addition, the system is 
very cheap, no expensive hardware is required. 

The system could be improved, specially in regards to noise problems with 
sonars, characterization of discontinued segments and verification of erroneous 
segments generated by data segmentation. 

Finally, based on the obtained results we conclude that the proposed method 
is reliable when used in environments complying with the requisites afore 
mentioned. In addition, this research leaves a solid base for future jobs that 
require sonar systems. For example, the system can be included in robots and 
autonomous vehicles that require a more precise, cheap and reliable environment 
modelling for a better navigation. 
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Abstract. Non-speech audio gives important information from the environment that 
can be used in robot navigation altogether with other sensor information. In this ar- 
ticle we propose a new methodology to study non-speech audio signals with pattern 
recognition techniques in order to help a mobile robot to self-localize in space do- 
main. The feature space will be built with the more relevant coefficients of signal 
identification after a wavelet transformation preprocessing step given the non- 
stationary property of this kind of signals. 



1 Introduction 

Sound offers advantages for information systems in delivery of alerts, duration infor- 
mation, encoding of rapidly incoming information, representing position in 3-D space 
around the person and her localization. Hearing is one of human beings most important 
senses. After vision, it is the sense most used to gather information about our environ- 
ment. Despite this, little research has been done into the use of sound by a computer to 
study its environment. The research that has been done focuses mainly on speech recog- 
nition [1], [2], while research into other types of sound recognition has being neglected. 
In robotics, non-speech audio has been ignored in front of artificial vision, laser beams 
and mechanical wave sensors beyond the audible spectrum. But the study and modeling 
of non-speech audio can help greatly to robot navigation and localization in the space 
domain. The existing research in non-speech sound is incipient and focuses on signal 
processing techniques for feature extraction with the use of neural networks as a classifi- 
cation technique [3], [4]. In this article a new technique based on pattern recognition 
techniques in order to locate a robot in the space domain by non-speech audio signals is 
proposed. The feature space will be built with the coefficients of model identification of 
audio signals. Due to their non-stationary property wavelet decomposition is needed as a 
preprocessing step. We also propose a technique (transform function) to convert the sam- 
ples in the feature space into the space domain, based in the sound derivative partial 
equation described in [1], In section 2 the feature selection and feature vector are de- 
scribed as soon as the procedure to obtain the transform function. In section 3 we present 
an experiment in order to test the proposed algorithms and teehniques. 
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2 Non-speech Audio Feature Extraction Approach for Localization 
in Space Domain 

In this section we propose a new localization in space domain approach from non- 
speech audio signals that will be applied on a robot in an industrial environment, the 
approach follows the next steps: 1) measurement and data preprocessing. 2) MAX 
models signals identification by the wavelet transform; 3) feature selection, feature 
extraction and its correspondence with the space domain. Non-speech audio signal 
generated by any audio source (industrial machinery, appliances, etc.) is continuous by 
its nature. Preliminary, non-speech signal preprocessing includes sampling the analog 
audio signal with a specific frequency and to convert it into a discrete set of samples. 
Sampling interval should be chosen in such a way that essential information be pre- 
served. In this case, due to the audio signal fonn we have followed the same criteria as 
[5] in order to choose the sampling frequency because its similarity to speech signals. 



2.1 Model Identification by the Wavelet Transform and Feature Selection 

Non-speech audio signal have the property of non-stationary signal in the same way 
that many real signals encountered in speech processing, image processing, ECG 
analysis, communications, control and seismology. To represent the behavior of a sta- 
tionary process is common the use of models (AR, ARX, ARMA, ARMAX, OE, etc.) 
obtained from the experimental identification [6]. The coefficient estimation can be 
done with different criteria: LSE, MLE, among others. But in the case of non- 
stationary signals the classical identification theory and its results are not suitable. 
Many authors have proposed different approaches to modeling this kind of non- 
stationary signals, that can be classified: i) assuming that a non stationary process is 
locally stationary in a finite time interval so that various recursive estimation tech- 
niques (RES, PER, RIV, etc.) can be applied [6]; ii) a state space modeling and a Kal- 
man filtering; iii) expanding each time-varying parameter coefficients onto a set of 
basis sequences [7]; and iv) nonparametric approaches for non-stationary spectrum 
estimation such a local evolving spectrum, STFT and WVD are also developed to 
characterize non-stationary signals [8]. 

To overcome the drawbacks of the identification algorithms, wavelets could be con- 
sidered for time varying model identification. The distinct feature of a wavelet is its 
multiresolution characteristic that is very suitable for non-stationary signal processing 
[9]. Wavelet transform can decompose L?{R) space to a linear combination of a set of 
orthogonal subspace adaptively which divide the whole frequency bands into a series 
of subbands from high to low frequency, representing the multiresolution characteris- 
tics of the original signal. 

As non-speech audio signals are non-stationary and have very complex waveforms 
because of the composition of various frequency components, a signal transformation 
is performed. The idea of signal transformation is to separate the incoming signal into 
frequency bands. This task may be solved with the use of filter bank or wavelet trans- 
form, as psychoacoustics has associated human hearing to non-uniform critic bands. 
These bands can be realized roughly as a four-level dyadic tree. For sampling at 8kHz 
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the frequencies of the dyadic tree are 0-250Hz, 250-500Hz, 500-1000Hz, 1000- 
2000Hz, 1000-2000Hz and 2000-4000Hz. Each input signal are decomposed in 4 lev- 
els, that is, the audio signal Si=A4i+D4i+D3i+D2i+Dli, where A4i is the approximation 
of the original 5”, signal and Dj, (j^l,4) are the detail signals for Si. 

The wavelet transform have been done with the Daubechies wavelet, because it 
captures very well the characteristics and information of the non-speech audio signals. 
This set of wavelets has been extensively used since its coefficients capture the maxi- 
mum amount of the signal energy [9]. 

A MAX model (Moving Averaging Exogenous) represents the sampled signals in 
different points of the space domain because the signals are correlated. We use the 
closest signal to the audio source as signal input for the model. Only the model coeffi- 
cients need to be stored to compare and to discriminate the different audio signals. This 
would not happen if the signal were represented by a AR model because the coeffi- 
cients depend on the signal itself and, with a different signal in every point in the space 
domain, these coefficients would not be significative enough to discriminate the audio 
signals. When the model identification is obtained by wavelets transform, the coeffi- 
cients that do not give information enough for the model are ignored. The eigenvalues 
of the covariance matrix are analyzed and we reject those coefficients that do not have 
discriminatory power. For the estimation of each signal the approximation signal and 
its significative details are used following the next process: i) model structure selec- 
tion; ii) model parameters calibration with a estimation model (the LSE method can be 
used for its simplicity and, furthermore a good identified model coefficients conver- 
gence is assured); iii) validation of the model. 

Let us consider the following TV-MAX model and be Si = y{n), 

y{n) = ^ b{n; k)u{n -k) + 'Y^ c{ir, k)e{n -k) ( 1 ) 

;t=o t=o 

where y(n) is the system output, u{n) is the observable input, which is assumed as the 
closest signal to the audio source, and e{n) is a noise signal. The second term is neces- 
sary whenever the measurement noise is colored and needs further modeling. In dis- 
crete time, wavelet expansions are computed through filter banks. Now we expand the 
coefficients b{ir,k) and c{n;k) onto a wavelet basis, 

y{n) = T^ («) -I- Tj (n) where (2) 



T, («) = z k'"”’ (« - - k)\+ Z k"' (« - 2' '«)«(« - ^)J(3) 

k=Q m A=0 j=Jmin 

r 1 r Jmax 1 

^2 («) = Z Z c k'"’”’ (« - 2'“ r,r)e{n - k)J+ Z Z K' (« “ 2' m)e{n - k)J(4) 

A=0 m k=0 j=Jmin 

Let ho («) and hi(n), be a dyadic Perfect Reconstruction Filter Bank (PRFB). 

Then, for a fixed k, the wavelet coefficients, corresponding to the low-resolution 
and the detail signal of b{n;k), are given by 

CS’-Z^o(W2/u-/;A:) 

/ 



and (5) 
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( 6 ) 

I 

respectively. Therefore the signal b{rr,k) can be reconstructed from and by 
the synthesis equation, 

= Z S * ^0 (« - 27w) + ^ ^ ^h^{n- 2m) (7) 

m m 

where hi («) = hj (-«), / = 1,2 . See reference [9] for further details. In order to obtain the 
c{rv,k) coefficients we follow the same procedure. 



2.2 Feature Extraction and Spatial Recognition 

The coefficients for the different models will be used as the feature vector, which can 
be defined as Xs, where 

( 8 ) 

where q+1 and r+1 are the amount of b and c coefficients respectively. From every 
input signal a new feature vector is obtained representing a new point in the {q+r+2)- 
dimensional feature space, fs. For feature selection, it is not necessary to apply any 
statistical test to verify that each component of the vector has enough discriminatory 
power because this step has been already done in the wavelet transform preprocessing. 

This feature space will be used to classify the different audio signals entering the 
system. For these reason we need some labeled samples with their preeise position in 
the space domain. (In the following section an specific experiment is shown). When an 
unlabeled sample enters the feature space, the minimum distance to a labeled sample is 
computed and this measure of distance will be used to estimate the distance to the same 
sample in the space domain. For this reason we need a transformation function which 
converts the distance in the feature space in the distance in the space domain, /p : 3? — > 
31, (/t : {{q+r+2)-D fs) — > (2-D x-y space domain), note that the distance is an scalar 
value, independently of the dimension of the space where it has been computed. 

The Euclidean distance is used, and the distance between to samples 5, and 5) in the 
feature space is defined as 

d {Si ,Sj)= ^ (b,^^ - b,s^ )2 + ^ )2 (9) 

, k=0 k=0 

where bj^s: and are the b and c coefficients, respeetively, of the wavelet transform 
for the Si signal. It is not necessary to normalize the coefficients before the distance 
calculation because they are already normalized intrinsically by the wavelet transfor- 
mation. 

This distance computation between the unlabeled sample and labeled samples is re- 
peated for the three closest samples to the unlabeled one. Applying then the transfor- 
mation fiinetion fj three distances in the x-y domain are obtained. These distances 
indicate where the unlabeled sample is located. Now, with a simple process of geome- 
try, the position of the unlabeled sample can be estimated. The intersection of the three 
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circles, ideally yields a unique point, corresponding to the position of the unlabeled 
sample. In the practiee, the three circles intersection yields an area proportional to the 
error of the whole system. The position of the sample is approximated by the centroid 
of this area. 

f-y . ^ . dfs(Sj,Sj^ ^ r/^(Sy,S^) Vj, fj . dfs(Sp,Si^ ^ ^xy(Sp,S^) Vp 

where 5„ Sj and Sp are three labeled samples and r„ r, and Vp are the distances in the 
space domain to the unlabeled sample S^. The distance is understood as a radius be- 
cause the angle is unknown. 

Because there exist the same relative distances between signals with different mod- 
els, and with the knowledge that the greater the distortion the farther the signal is from 
the audio souree, we choose those eorrespondences {dy^,df^ between the samples that 
are closest to the audio source equidistant in the d,y axis. These points will serve to 
estimate a curve of «-order, that is, the transformation function fy. Normally this func- 
tion is a polynomial of 4* order and there are several solutions for a unique distance in 
the feature space, that is, it yields different distances in the x-y space domain. We solve 
this drawbaek adding a new variable; previous position of the robot. If we have an 
approximate position of the robot, its speed and the computation time between feature 
extraction samples, we will have a coarse approximation of the new robot position, 
coarse enough to discriminate among the solutions of the 4*-order polynomial. In the 
experiments seetion a waveform for the fy function can be seen, and it follows the 
model from the sound derivative partial equation proposed in [1]. 

In the figure 1 the localization system ean be shown, including the wavelet trans- 
formation block, the modeling blocks, the feature space and the spatial recognition 
block which has as input the environment of the robot and the function fy. 



Environmental properties 



robot 

speed 




Fig. 1. Localization system in space domain from non-speech audio signals. 
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3 Experimental Results 

In order to prepare a setting as real as possible, we have used a workshop with a 
CNC milling machine as non-speech audio source. The room has a dimension of 7 
meters by 10 meters and we obtain 9 labeled samples (from 5; to Sg), acquired at regu- 
lar positions, covering all the representative workshop surface. With the dimensions of 
the room, these 9 samples are enough because there is not a significative variance 
when oversampling. In figure 3 (right) the arrangement of the labeled samples can be 
observed. The robot [10] enters the room, describes a predefined trajectory and gets 
off In its trajectory the robot picks four unlabeled samples (audio signals) that will be 
used as data test for our algorithms (Sio, Sn, S 12 and S 13 ). The sample frequency is 
8kHz and a capacitive microphone is used. 

First, in order to obtain the 9 models coefficients corresponding to the 9 labeled 
non-stationary audio signals, these signals are decomposed by the wavelet transform in 

4 levels, with one approximation signal and 4 detail signals, figure 2. For the whole 
samples, the relevance of every signal is analyzed. We observe the more significative 
decomposition to formulate the prediction model, that is, those details containing the 
more energy of the signal. With the approximation (A4,) and the detail signal of 4* 
level {D4i) is enough to represent the original signal, because the mean and deviation 
for the D3i, D2i and Z)7, detail signals are two orders of magnitude below A4i and D4,. 
Figure 2 (up right) shows the difference between the original signal and the estimated 
signal with A4i and D4j. Practically there is no error when overlapped. In this experi- 
ment we have chosen the Daubechies 45 wavelet transfonn because it yields good 
results, after testing different Daubechies wavelets. 

After a initial step for selecting the model structure, it is determined that the order of 
the model has to be 20 (10 for the A4i and 10 for D4j coefficients), and a MAX model 
has been selected, for the reasons explained above. When those 9 models are calibrated, 
they are validated with the error criteria of FPE (Function Prediction Error) and MSE 
(Mean Square Error), yielding values about 10 '" and 5% respectively using 5000 data 
for identification and 1000 for validation. Besides, for the whole estimated models the 
residuals autocorrelation and cross-correlation between the inputs and residuals are 
uncorrelated, indicating the goodness of the models. 

These coefficients form the feature space, where the relative distances among all the 
samples are calculated and related in the way explained in section 2 in order to obtain 
the transform function /(. With these relations, the curve appearing in figure 3 (left) is 
obtained, under the minimum square error criteria, approximated by a 4*-order poly- 
nomial with the following expression: 

/j. = df, = -9.65e(-10)c/^ +\.6\e{-5)dly - 8.49e(-2)t/^_^ + 144.89c/^ + 107.84 (10) 

which is related with the solution of the sound equation in [1] with a physical 
meaning. 
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Fig. 2. (Left) Multilevel wavelet decomposition 
of a non-speech signal (S 2 ) by an approximation 
signal and four detail signal; (right) comparison 
between original signal {A4+D4) and the esti- 
mated signal and its error (below) for S;;. 
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Fig. 3. (Left) Transform function /)■; (right) robot environment: labeled audio signals and actual 
robot trajectory with unlabeled signals {Sio, Su, Sj 2 , Sj^)- 

With the transform function /r we proceed to find the three minimum distances in 
the feature space to each unlabeled sample respeet the labeled ones, that is, for audio 



228 Y. Bolea, A. Grau, and A. Sanfeliu 



signals Sjo, Sn, S 12 and S 13 , respect Si, S9. We obtain four solutions for each signal 
because each distance in the feature space crosses four times the fj curve. In order to 
discard the false solutions we use the previous position information of the robot, that is 
the (Xi,j,)prev point. We also know the robot speed (v = 15cm/sec) and the computation 
time between each new position given by the system, which is close to 3 sec. If we 
consider the movement of the robot at constant speed, the new position will be (x„y,)prev 
± (450,450)mm. With this information we choose the solution that best fits with the 
crossing circles solution. In table 1, the recognition rate for each estimated position in 
space domain are presented, in any case there is an error bigger than the 15%, and in 
one case the error is under the 0.5%. 

Table 1. Rate of spatial recognition results for unlabeled samples respect their actual position. 

Original signal Sw Sn Sn Sn 

Cartesian coord. yw Xn yii xn yn xn yn 

Recognition rate (%) 90.4 85 97.98 87.69 89.18 99.58 88.35 94.42 



4 Conclusions 

With the methodology presented in this article we have achieved some interesting 
results that encourage the authors to keep on walking in this research field. The intro- 
duction of more that one audio source is also a new challenge. The experimental results 
show a narrow correspondence with the sound physical model and this demonstrates a 
high reliability of the proposed methodology. 
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Abstract. Denoising of speech signals using a sliding discrete cosine 
transforms (DCT) is proposed. A minimum mean-square error (MMSE) 
estimator in the domain of a sliding DCT is derived. In order to provide speech 
processing in real time, a fast recursive algorithm for computing the sliding 
DCT is presented. The algorithm is based on a recursive relationship between 
three subsequent local DCT spectra. Extensive testing has shown that 
background noise in actual environment such as the helicopter cockpit can be 
made imperceptible by proper choice of suppression parameters. 



1 Introduction 

Processing of speech degraded due to additive background noise is of interest in a 
variety of tasks. For example, many speech transmission and coding systems, whose 
design is predicated on a relatively noise-free environment, degrade quickly in quality 
and performance in the presence of background noise. Thus, there is a considerable 
interest in and application for the development of such systems, which compensate for 
the presence of noise. In many cases, intelligibility is affected by background noise so 
that a principal objective of a speech processing system may be to improve 
intelligibility. Numerous systems have been proposed to remove or reduce 
background noise [1-8]. These systems provide an apparent improvement in signal-to- 
noise ratio, but intelligibility is in fact reduced. In this paper, an approach to speech 
denoising on the base of a sliding DCT is used. 

In many filtering and spectral analysis applications, the signals such as speech have 
inherently infinite length. Moreover, since the signal properties (amplitudes, 
frequencies, and phases) usually change with time, a single orthogonal transform is 
not sufficient to describe such signals. As a result, the concept of short-time signal 
processing with filtering in the domain of an orthogonal transform can be used [9]. 
The short-time orthogonal transform of a signal is defined as 



where w„ is a window sequence, ii/{n,s) represents the basis functions of an orthogonal 
transform. Equation (1) can be interpreted as the orthogonal transform of xi^+„ as 
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viewed through the window w„. X* displays the orthogonal transform characteristics 
of the signal around time k. Note that while increased window length and resolution 
are typically beneficial in the spectral analysis of stationary data, for time-varying 
data it is preferable to keep the window length sufficiently short so that the signal is 
approximately stationary over the window duration. 

We assume that the window has finite length around n=0, and it is unity for all ne [- 
Nj, N2] ■ Here Nj and N2 are integer values. This leads to signal processing in a sliding 
window [10]. In other words, local filters in the domain of an orthogonal transform at 
each position of a moving window modify the orthogonal transform coefficients of a 
signal to obtain only an estimate of the pixel of the window. The choice of 
orthogonal transform for sliding signal processing depends on many factors. The DCT 
is one the most appropriate transform with respect to the accuracy of power spectrum 
estimation from the observed data that is required for local filtering, the filter design, 
and computational complexity of the filter implementation. For example, linear 
filtering in the domain of DCT followed by inverse transforming is superior to that of 
the discrete Fourier transform (DFT) because a DCT can be considered as the DFT of 
a signal evenly extended outside its edges. This consequently attenuates boundary 
(temporal aliasing) effects caused by circular convolution that are typical for linear 
filtering in the domain of DFT. In the case of DFT, speech frames are usually 
windowed to avoid temporal aliasing and to ensure a smooth transition of filters in 
successive frames. For the filtering in the domain of DCT, the windowing operation 
can be skipped. In such a manner the computational complexity can be further 
reduced. 

The presentation is organized as follows. In Section 2, we present computationally 
efficient algorithm for computing the sliding DCTs. In Section 3, an explicit filter 
formula minimizing the MMSE defined in the domain of the sliding DCT is derived. 
We also test the filter performance in actual environment such as the helicopter 
cockpit. Section 4 summarizes our conclusions. 



2 Fast Algorithm for Computing the Sliding DCT 

The discrete cosine transform is widely used in many signal processing applications 
such as adaptive filtering, video signal processing, feature extraction, and data 
compression. This is because the DCT performs close to the optimum Karhunen- 
Loeve transform for the first-order Markov stationary data, when the correlation 
coefficient is near 0.9 [11]. Four types of DCTs were classified [12]. The DCT 
discussed in the paper is referred to the type-II. The kernel of the DCT is defined for 
the order N as 




( 2 ) 
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where n, s=0, 1,..., N-1; 




if s = 0, 
otherwise. 



. For clarity, the normalization 



factor ■\J2/N for the forward transform is neglected until the inverse transform. The 
sliding cosine transform (SCT) is defined as 






k 

S 




{ii + Ni +l/2)s ^ 
^ / 



(3) 



where N=Nj+N 2 +^, { ; s=0, 1,..., N-1} are the transform coefficients around time 

k. The coefficients of the DCT can be obtained as { Cq = Xq j sfl ; Cj = , s=l,..., 

N-1). We now derive fast algorithm for the SCT on the base of a recursive 
relationship between three subsequent local DCT spectra [13]. The local DCT spectra 
at the window positions k-1 and k+1 are given by 
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(4) 

(5) 



Using properties of the cosine function and equations (4) and (5), we can write 



X^+'^=2xt cosl 



■ ■ (fl 

{xk-N^-l-Xk-N^ +(-l)'(' 



■X’l~^ +cos| 



f ns 
1 



(6) 



, 11 , 



We see that the computation of the DCT at the window position k+1 involves values 
of the input sequence jcj. as well as the DCT coefficients computed in two previous 
positions of the moving window. The number of arithmetic operations required for 
computing the sliding discrete cosine transform at a given window position is 
evaluated as follows; the SCT for the order N with N=Ni+N 2 +^ requires 2(W1) 
multiplication operations and 2A1+5 addition operations; the DCT requires one extra 
operation of multiplication. Table 1 lists numerical results of computational 
complexity for the proposed algorithm and known fast DCT algorithms. Note that fast 
DCT algorithms require the length of a moving window to be of a power of 2, N=2^. 
In contrast, the length of a moving window for the proposed algorithm is an arbitrary 
integer value determined by the characteristics of the signal to be processed. 

We see that the proposed algorithm yields essentially better results when the length of 
the window increases. 
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Table 1. Number of multiplications and additions for computing the sliding DCT 



M 


FastDCT[14, 15] 


Proposed algorithm 


Mult. 


Add. 


Mult. 


Add. 


16 


33 


81 


30 


37 


32 


81 


209 


62 


69 


64 


193 


513 


126 


133 


128 


449 


1217 


254 


261 


256 


1025 


2817 


510 


517 



The inverse algorithms for the sliding DCT can be written as follows. 



■Tr =- 



N 



( N~\ 
. . 5=1 



X, cos K 



(iVi +l/2>' 



N 



+ Xr 



( 7 ) 



where N=Ni+N 2 +^- The computational complexity is N multiplication operations 
and N addition operations. If is the central pixel of the window, that is, Nj=N 2 and 
N=2Ni + l, then the inverse transform is simplified to 



=- 



N 



2 £(- 1 )^ 



yk , yk 
^ 2i + ^ 0 



( 8 ) 



We note that in the computation only the spectral coefficients with even indices are 
involved. The computation requires one multiplication operation and Nj+1 addition 
operations. 



3 Denoising of Speech Signals in the Sliding DCT Domain 

The objective of this section is to develop a noise suppression technique on the base 
of the sliding DCT, and to test the algorithm performance in actual noise 
environment. We design locally adaptive filters to enhance noisy speech. Assume that 
a clean speech signal {< 2 ^} is degraded by zero-mean additive noise {v*:} 

Xk=^k+Vk, ( 9 ) 



where {x^} is a noisy speech sequence. 

Let { , y/ , Aj ; s=0, 1,..., N-1] be the DCT transform coefficients around 

time k of noisy speech, clean speech, noise, and filtered signal, respectively. Here 
N=2N]+1 is the length of the DCT. Note that Nj is an arbitrary integer value, which is 
determined by pitch period of speech. One can be chosen to be approximately as the 
maximum expected pitch period for adequate frequency resolution. 
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Various criteria can be exploited for the filter design. In the following analysis we 
use the criterion of the MMSE around time k which is defined in the domain of DCT, 
taking into account (8), as follows: 



MMSE^ = E 




k 

It 




( 10 ) 



where denotes the expected value. 

As we mentioned above, the length of the window is chosen in such a way that noise 

k l\ k\^\ 

can be considered as stationary in the window. Let P, = E( V, ) denote the power 



spectrum of noise in the domain of DCT. Suppose that , here //f is the 

filter to be designed around time k. By minimizing MMSE^ with respect to //f , we 
arrive to a version of the Wiener filter in the domain of DCT : 



Hf 




( 11 ) 



The MMSE estimation of the processed speech in the domain of the sliding DCT is 
given by 



Af = 



1- 



Pt 






0 , 






otherwise 



( 12 ) 



The obtained filter can be considered as a spectral subtraction method in the domain 
of sliding DCT. In general, spectral subtraction methods [1], while reducing the wide- 
band noise, introduce a new “musical”' noise due to the presence of remaining 
spectral peaks. To attenuate the “musical” noise, one can suggest oversubtraction of 
the power spectrum of noise by introducing a nonzero power spectrum bias. Einally, 
the MMSE estimation of the processed speech in the domain of the sliding DCT can 
be written as follows: 



Af =• 



1- 



IL 

0 , 



xf,ifzf >pI‘+b'' 



otherwise 



( 13 ) 



where is a speech-dependent bias value. 

The filtered speech signal can be obtained with use of (8). It also follows from (8) that 
in the estimation only the spectral coefficients with even indices are involved. 
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A test speech signal recorded in helicopter environment is presented in Fig.l. The 
data was sampled at 16.00kHz. In our tests the window length of 761 samples is used. 
The sliding squared DCT coefficients averaged over all positions of the running 
window for noisy speech is presented in Fig. 2. The power spectrum of noise is 
obtained by actual measurement from background noise in intervals where speech is 
not presented. It is shown in Fig. 3. We see the difference in spectral distributions of 
the speech and the helicopter noise, which will help us to suppress the helicopter 
noise. 




Fig. 1. Time wavefront of helicopter speech Fig. 2. Average squared DCT magnitude 

of noisy speech 





Fig. 4. Enhanced speech signal Fig. 3. Average squared DCT magnitude 

of noise 

The result of filtering by using the proposed filter is shown in Fig. 4. It is clearly that 
the system is capable of significant noise reduction. Numerous formal subjective tests 
are shown that the helicopter noise can be made imperceptible by proper choice of the 
filter parameters in (13). 

In this section we derived a filter for noise suppression on assumption that speech is 
always was presented in the measured data. However, if a given frame of data 
consists of noise alone, then obviously a better suppression filter can be used [5, 6]. In 
general, an optimal algorithm should include a detector of voiced and unvoiced 
speech signals. After detecting, different strategies of processing should be applied to 
voiced and unvoiced speech signals. 
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4 Conclusions 

In this paper, we have presented a new technique for enhancing speech degraded by 
additive noise. The technique utilizes the sliding DCT. A MMSE estimator in the 
domain of the sliding DCT has been derived. In order to provide speech processing in 
real time, a fast recursive algorithm for computing the sliding DCT has been 
suggested. The algorithm requires essentially less operations of multiplication and 
addition comparing with known fast DCT algorithms. Extensive testing has shown 
that background noise such as in the helicopter cockpit can be significantly reduced 
by proper choice of suppression parameters. 
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Abstract. In Continuous Speech Recognition (CSR) systems, acoustic 
and Language Models (LM) must be integrated. To get optimum CSR 
performances, it is well-known that heuristic factors must be optimised. 
Due to its great effect on final CSR performances, the exponential scaling 
factor applied to LM probabilities is the most important. LM probabil- 
ities are obtained after applying a smoothing technique. The use of the 
scaling factor implies a redistribution of the smoothed LM probabilities, 
i.e., a new smoothing is obtained. In this work, the relationship between 
the amount of smoothing of LMs and the new smoothing achieved by 
the scaling factor is studied. High and low smoothed LMs, using well- 
known discounting techniques, were integrated into the CSR system. The 
experimental evaluation was carried out on two Spanish speech applica- 
tion tasks with very different levels of difficulty. The strong relationship 
observed between the two redistributions of the LM probabilities was in- 
dependent of the task. When the adequate value of the scaling factor was 
applied, not very different optimum CSR performances were obtained in 
spite of the great differences between perplexity values. 



1 Introduction 

In Continuous Speech Recognition (CSR) systems a Language Model (LM) is 
required to represent the syntactic constraints of the language. But there are 
a high number of sequences of words that do not appear in training and could 
appear in tests. Thus, a certain mass of probability must be subtracted from the 
seen combinations and redistributed among the unseen ones, i.e., a smoothing 
technique must be applied [1] [2]. 

The test set perplexity is typically used to evaluate the quality of the LM [1] 
[2] . Perplexity can be interpreted as the (geometric) average branching factor of 
the language according to the model. It is a function of both the language and 
the model. It is supposed that the “best” models get the “lowest” Word Error 
Rates (WER) of the CSR system. But there are plenty of contraexamples in 
literature [3] . The ability of the test set perplexity to predict the real behavior of a 

* This work has been partially supported by the Spanish CICYT under grant TIC2002- 
04103-C03-02 and by the Basque Country University under grant (9/UPV 00224.310- 
13566/2001) 
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smoothing technique when working in a CSR system could be questioned because 
it does not take into account the relationship with acoustic models. Several 
attempts have been made to devise metrics that are better correlated with the 
application error rate than perplexity [4]. But for now perplexity remains the 
main metric for practical language model construction [3]. In fact, the quality of 
the model must ultimately be measured by its effect on the specific application 
for which it was designed, namely by its effect on the system error rate. However, 
error rates are typically non-linear and poorly understood functions of language 
models [3]. In this work we try to clarify how the smoothing technique applied 
to the LM works in the CSR system and to show its real impact on final system 
error rates. 

Integration of language and acoustic models is invariably based on the well- 
known Bayes’ rule. However, it is well known that the best performance of a 
CSR system is obtained when LM probabilities in the Bayes’ rule are modified 
by introducing an exponential scaling factor [5] [6] . This factor can be understood 
as a new redistribution of the smoothed LM probabilities. As a consequence, LMs 
are smoothed twice: first by means of the smoothing technique and then by the 
exponential scaling parameter. The aim of this work is to establish a relationship 
between the amount of smoothing given by the smoothing technique and the 
amount of smoothing achieved by the exponential scaling factor (see Section 2). 

Thus, different amounts of smoothing need to be applied to LMs. Two dif- 
ferent well-stablished smoothing techniques leading to high and low-smoothed 
LM respectively, have therefore been evaluated (see Section 3) . The relationship 
between the amount of smoothing given by the smoothing technique and the 
amount of smoothing achieved by the exponential scaling factor is studied in 
terms of both classical test set perplexity and CSR performance. CSR perfor- 
mance was evaluated in terms of both, the obtained WER and involved compu- 
tational cost (see Section 4). Experimental evaluation was carried out over two 
Spanish databases of very different difficulty recorded by two consortia of Span- 
ish research groups to work in understanding and dialogue systems respectively. 
Finally, some concluding remarks are given in Section 5. 



2 Introducing the LM in the CSR System 

Within a CSR system there are several heuristic parameters that must be ad- 
justed to obtain optimum performances, such as the beam-search factor to reduce 
the computational cost, etc. But, the most important, due to its great effect on 
final CSR performance, may be the exponential scaling factor a applied over 
LM probabilities in Bayes’ rule [5]. In Bayes’ rule, the recognizer must find the 
word sequence 17 that satisfies: 

17 = argimxP(l7)“P(A/I7) (1) 

where P(I7) is the probability that the word sequence 17 = 0 J 1 UJ 2 ■ ■ -^\n\ from 
some previously established finite vocabulary S = {ujj}, j = 1 . . . lAj, will be 
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uttered and P{A/Q) is the probability of the sequence of acoustic observations 
A = aia 2 ...a|^| for a given sequence of words Q. Probabilities P{A/fi) are 
represented by acoustic models, usually Hidden Markov Models (HMM). The a 
priori probabilities P{f^) are given by the LM. 

From a theorical point of view, the scaling parameter a is needed because 
acoustic and LM probability distributions are not real but approximations [5]. 
The two probability distributions are estimated independently using different 
stochastic models that represent different knowledge sources. Moreover, the pa- 
rameters of the acoustic and language models are estimated on the basis of 
speech and text data corpora, respectively. Each corpora was designed with dif- 
ferent purposes, and they have therefore different vocabulary, size, complexity, 
etc. Thus, a balance parameter a needs to be applied to lessen these effects and 
then obtain good system performances. 

In practice, acoustic and LM have very different ranges of values. The ac- 
cumulated probabilities at the end of each partial sequence of words fl in the 
Viterbi trellis is a combination of acoustic P(A/17) and language P(l?) prob- 
abilities. Acoustic probabilities are usually smaller than language probabilities 
and are applied many more times. The gap among accumulated probabilities is 
therefore usually bigger than the gap among LM probabilities. The immediate 
consequence is that LM probabilities are irrelevant in most situations for de- 
ciding the best path to choice^ [7]. However, when LM probabilities are raised 
to a power a > 1: (P(l7))“, all of them are attenuated, but this attenuation is 
higher for lower probability values. A bigger gap is therefore obtained between 
high and low probabilities and then LM probabilities are now more relevant to 
decide the next word combination. There is a maximum value of a from which 
LM probabilities are overvalued. 

It is important to notice that the smoothing technique clearly defines the 
LM probability distributions and thus, the “a priori” gap among probabilities. 
So that, the relationship between the smoothing technique and the exponential 
scaling factor applied over LM probabilities must be stablished. 



3 High and Low Smoothed LMs 

The purpose of this work was not to achieve an exhaustive comparison of smooth- 
ing techniques like others authors did [1] [2]. The main goal was to observe the 
relationship between the amount of smoothing given by the smoothing technique 
and the amount of smoothing achieved by the scaling exponential factor. Thus, 
two well-known back-off smoothing techniques [8] involving very different amount 
of discounting have been chosen. Witten-Bell (WBd) and Add-One (AOd) dis- 
counting have been used to obtain high and low smoothed LMs respectively. In 
high-smoothed LMs the probability reserved by the smoothing technique for the 
unseen events is bigger than in low-smoothed LM. As a consequence the gap be- 

^ This phenomenon is also related to the problem of the negligible impact that tran- 
sition probabilities have in acoustic models. 
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tween LM probabilities in high-smoothed LMs is smaller than in low-smoothed 
models. 

The amount of discounting performed by Witten-Bell and Add-one tech- 
niques does not need to be adjusted by any additional parameter, like in other 
well-know techniques, such as Kneser-Ney, linear, etc [1] [2]. In both cases the 
amount of discounting is fixed and fully defined by the technique. 

If ft, = is a history representing a sequence of n — I words, 

N(wi/h) is the number of times that word Wi appears after history ft, N(h) = 
N{uji/h) and j3{wi/h*) is the probability distribution of a more general 

Vu; / 

N{wi/h)^0 

model (ft* represents a history of words of length less than ft), the smoothed LM 
probability P{wi/h) is calculated as: 



(1-A) 



N{uii/h) 

N(h) 



P{uJi/h) = < 



E 

■Vwj/ 

N(n:j/h)^Q 



N(h) 



E 

'^Wj/ 

N(w./h)= 



j3{ujilh* 






N{wjh) ^ 0 
N{wjh) = 0 



( 2 ) 



(1 — A) represents the discount factor, that is, the amount of probability to be 
subtracted and then redistributed among unseen events. The discount factor 
(1-A) can have very different formulations [1] [2]. In fact, we have given adequate 
values to (1-A) to obtain high and low-smoothed LMs using Witten-Bell and 
Add-One discounting respectively. Those discounting are fully explained in the 
following paragraphs. 



High- Smoothing: Witten-Bell Discounting: 

In Witten-Bell, the discount (1 — A) depends fundamentally on the number of 
different events T following the history ft. That is: 



1- A = 



m) 

N{h)+T 



( 3 ) 



It is widely used since it leads to low text set perplexities when compared 
to other classical back-off methods [1]. However, a dependence was found [2] 
between perplexity and the size of the training of the LMs when Witten-Bell 
discounting was used. 

In this case a quite important mass of probability is assigned to unseen events 
(high-smoothing) and the gap between seen and unseen probabilities is reduced. 
Combinations of words unseen in training can have a relative high probability 
in test. 



Low-Smoothing: Add-One Discounting: 

This is a very simple discounting method, adding one to all the counts. It was 
calculated as: 

Njh) 

N{h) + 1 



1-A = 



( 4 ) 
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This method does not usually perform well and thus is not commonly used by 
itself. Usually it is applied as part of more complicate methods ^ [1] . 

Since 1 < T, using add-one discounting a smaller mass of probability is re- 
distributed among unseen events (low-smoothing) than using Witten-Bell dis- 
counting. The gap among LM probabilities is therefore bigger using Add-One 
discounting. 



3.1 LM Evaluation in Perplexity 



Topics related to the obtaining of LMs, such as smoothing techniques, are usually 
evaluated in terms of perplexity. The test set Perplexity (PP) is based on the 
mean log probability that a LM assigns to a test set wf of size L. It is thus 
based exclusively on the probability of words which actually occur in the test as 
follows: 



PP = = e 



L 

log{P{uJi/uil~^)) 



i = l 



( 5 ) 



The test set perplexity measures the branching factor associated to a task, 
which depends on the number of different words in the text. Low perplexity 
values are obtained when high probabilities are assigned to the test set events 
by the LM being evaluated. When the test set includes a high number of un- 
seen combinations of n words, the probability P{uJi/oij\~^) mainly depends on 
the smoothing technique. In such a case, P{uji/u;l~^) is lower for low-smoothed 
LMs and, as a consequence bad Perplexity values will be obtained. Thus, high- 
smoothed techniques lead to good perplexity values when evaluated over test-set 
including a high number of unseen events. However, this good LM behavior is 
not always confirmed by the CSR system performance which also includes the 
acoustic models [4]. 



4 Experimental Evaluation 

In this section the relationship between the two redistributions of the LM prob- 
abilities, i.e., the application of the smoothing technique and the scaling factor, 
is experimentally established. The experimental evaluation was carried out with 
two Spanish databases of very different levels of difficulty: Bdgeo and Info_Tren. 

Bdgeo is a task-oriented Spanish speech corpus [9] consisting of 82000 words 
and a vocabulary of 1208 words. This corpus represents a set of queries to a 
Spanish geography database. This is a specific task designed to test integrated 
systems (acoustic, syntactic and semantic modelling) in automatic speech under- 
standing. The training corpus consisted of 9150 sentences. The test set consisted 
of 600 sentences. Recording was carried out by 12 speakers in laboratory envi- 
ronments at 16Kz. 

^ This technique is applied in Katz’s discounting when all events at one state q are 
seen more than r times [1] 
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Info_Tren database has recently been recorded as part of a project to de- 
velop a dialogue system. Info_tren is a very difficult task of spontaneous Spanish 
speech dialogues with a vocabulary of around 2000 words plus 15 different acous- 
tic types of disfluencies such as noises, filled pauses, lengthenings, etc. [10]. The 
task consisted of 227 Spanish dialogues on train information. They were recorded 
at 8KHz across telephone lines, applying the well known Wizard of Oz mech- 
anism. The training corpus consisted of 191 dialogues uttered by 63 different 
speakers (1349 user turns resulting in 16500 words plus 5000 disfluencies). The 
test set consisted of 36 dialogues corresponding to 12 different speakers (308 user 
turns including 4000 words plus around 500 disfluencies). InfoTren is the first 
spontaneous dialog database recorded by Castilian Spanish speakers. 

High and low-smoothed n-gram LMs with n = 2 ... 4 were obtained, using 
Witten-Bell (WBd) and Add-One (AOd) discounting respectively. Table 1 shows 
the perplexity (PP) results obtained. LMs associated to the Info_Tren database 
included the disfluencies as part of the vocabulary and there was a quite con- 
siderable mismatching between training and test [10]. As a consequence, the 
Perplexity values associated with this task are quite high. 

For both tasks, the best (lowest) PP values were obtained using high- 
smoothed LMs (WBd). Nevertheless, differences among high and low-smoothed 
models behavior were more important for Info_Tren task. In this task the number 
of word sequences appearing in the test set but not appearing in the training set 
is higher than in Bdgeo task. Higher PP values are obtained using low-smoothed 
LMs, since they assign lower backoff probabilities than high-smoothed LMs to 
those sequences. For the Bdgeo task, the best PP values were obtained with 4- 
grams using both high and low-smoothed LMs. However, for the Info.Tren task 
the best PP results were reached with 3-grams (trigrams) by high-smoothed LMs 
and with 2-grams (bigrams) by low-smoothed LM were used. 

The LMs in Table 1 were integrated into a Spanish CSR system. Uttered 
sentences were decoded by the time-synchronous Viterbi algorithm with a fixed 
beam-search to reduce the computational cost. A chain of Hidden Markov models 
were used to represent the acoustic model of the word phonetic chain. Different 
exponential scaling parameters on LM probabilities were applied (a=1...7). 
Table 2 shows the CSR performances obtained: the Word Error Rate (WER) 
and the Average number of Active Nodes (AAN) (including both acoustic and 
LM nodes) needed to decode a sentence. Optimum performances are emphatised 
and underlined. 

When no scaling factor was applied (a = 1) low-smoothed LMs got better 
performances for both databases. As mentioned above, low-smoothed LMs lead 
to a bigger gap among LM probabilities than high-smoothed models. Thus, LM 
probabilities are more significant in the Viterbi trellis and, as a consequence, 
WER are lower than the obtained when using high-smoothed LMs. Computa- 
tional cost (ANN) is also lower for low-smoothed LMs because, for a fixed beam- 
search factor, when differences among probabilities are increased, the number of 
paths to keep in the lattice are reduced. 
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Table 1. Perplexity (PP) evaluation of n-grams LMs with n = 1 ... 4 for Bdgeo and 
Info_Tren tasks. Witten-Bell (WBd), Add-One (AOd) discounting were evaluated. 



n 


Bdgeo 


Info_Tren 




hieh smoothing 


low smoothing 


high smoothing 


low smoothing 




WBd 


AOd 


WBd 


AOd 


2 


13.1 


13.89 


36.84 


57.22 


3 


7.53 


8.30 


34.88 


69.87 


4 


7.17 


7.72 


36.37 


77.33 



Table 2. %WER evaluation of n-grams LMs of Table 1 with n = 2 ... 4 for Bdgeo 
and Info_Tren tasks. Witten-Bell (WBd), Add-One (AOd) discounting were evaluated. 



n 


OL 


Bdgeo 


Info.Tren 






high smoothing 


low smoothing 


high smoothing 


low smoothing 






WBd 


AOd 


WBd 


AOd 






WER 


AAN 


WER 


AAN 


WER 


AAN 


WER 


AAN 




1 


41.62 


3964 


33.29 


2209 


61.69 


3260 


56.75 


2610 




2 


25.80 


2588 


21.60 


1207 


50.23 


2594 


47.15 


1827 




3 


20.22 


1508 


17.33 


684 


43.83 


1912 


42.13 


1199 


n^2 


4 


16.99 


764 


14.98 


416 


41.08 


1291 


39.89 


760 




5 


15.80 


380 


15.20 


258 


39.60 


799 


40.75 


484 




6 


15.95 


218 


15.93 


173 


40.32 


467 


42.30 


335 




7 


17.01 


143 


18.14 


126 


41.75 


294 


43.92 


245 




1 


38,85 


5189 


28,3 


2935 


58,69 


6400 


55,60 


5172 




2 


21,86 


2984 


16,49 


1325 


48,72 


4668 


45,21 


3233 




3 


15,35 


1529 


12,5 


633 


42,14 


3172 


41,50 


1876 


n— 3 


4 


11,74 


702 


10,98 


339 


38,72 


1978 


39.36 


1060 




5 


10,82 


328 


11,04 


193 


38,01 


1135 


40,24 


610 




6 


10.85 


179 


13,08 


123 


38,41 


631 


43,13 


386 




7 


13.04 


114 


15,67 


88 


41,58 


378 


47,41 


269 




1 


38.50 


5374 


28.59 


3058 


58.80 


6480 


55.20 


5380 




2 


21.86 


3053 


16.03 


1356 


48.90 


4720 


45.00 


3410 




3 


14.44 


1544 


11.91 


640 


42.25 


3286 


41.04 


2237 


n^4 


4 


10.92 


704 


10.89 


339 


38.83 


2229 


39.10 


1250 




5 


10.24 


328 


10.67 


190 


37.84 


1269 


41.24 


708 




6 


10.22 


177 


13.44 


120 


38.63 


702 


43.70 


436 




7 


12.48 


113 


16.46 


85 


42.31 


415 


48.26 


296 



As it was mentioned in Section 2, the gap among LM probabilities is bigger for 
low-smoothed LMs than for high-smoothed ones. The scaling factor a increases 
this gap. As a consequence, low-smoothed LMs need lower values of a to get 
the best CSR performance (see Section 2). In any case, differences between 
optimum system WER obtained by low and high-smoothing techniques are not 
very significant. 

For Bdgeo task, the best performances were obtained using 4-grams as it 
was predicted by perplexity. However, for Info_Tren task, optimum performances 
were also obtained with 4-grams for both low and high smoothed LM in spite 
of the perplexity predictions. In fact, for Info_Tren task, perplexity increases 
strongly with n, specially using low-smoothed LMs, but WER decreases with n. 
The results obtained corroborate that PP is not the most adequate measurement 
of the smoothing technique. 

It has been experimentally established that there is a strong dependence be- 
tween the smoothing technique and the value of the scaling parameter a needed 
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to get the best performance of the system (which in many cases is perplexity 
independent). 

5 Concluding Remarks 

When smoothed LMs are integrated into the CSR system there are several heuris- 
tic parameters that must be taken into account. Due to its great effect on final 
CSR performances, the exponential scaling factor applied to LM probabilities is 
one of the most important. This factor increases the gap between LM probabil- 
ities to make them more competitive with acoustic probabilities in the Viterbi 
trellis. In this work, the relationship between the smoothing technique and the 
scaling factor is established. Low and high smoothed LMs have been evaluated in 
two Spanish tasks of very different difficulty. Similar optimum CSR performances 
could obtained applying the adequate value of the scaling factor in each case. 
Low-smoothed LM reach their optimum CSR performances with lower values 
of the scaling factor than high smoothed LMs because they have an “a priori” 
bigger gap among LM probabilities. Experiments showed that an increase of 
the test set perplexity of a LM does not always mean degradation in the model 
performance, which depends fundamentally on empirical factors. 



References 

1. Ney, H., Martin, S., Wessel, F.: Statistical Language Modeling using leaving-one- 
out. In Young, S., ed.: LM. Kluwer Academic Publishers (1997) 174-207 

2. Chen, F.S., Goodman, J.: An empirical study of smoothing techniques for language 
modeling. Computer, Speech and Language 13 (1999) 359-394 

3. Rosenfeld, R.: Two decades of statistical language modeling: Where do we go from 
here (2000) 

4. Clarkson, P., Robinson, R.: Improved language modelling through better language 
model evaluation measures. Computer, Speech and Language 15 (2001) 39-53 

5. Jelinek, F.: Five speculations (and a divertimento) on the themes of h. bourlard, 
h. hermansky and n. morgan. Speech Communication 18 (1996) 242-246 

6. Klakow, D., Peters, J.: Testing the correlation of word error rate and perplexity. 
Speech Communication 38 (2002) 19-28 

7. Varona, A., Torres, 1.: Back-Off smoothing evaluation over syntactic language mod- 
els. In: Proc. of European Conference on Speech Technology. Volume 3 . (2001) 
2135-2138 

8. Torres, L, Varona, A.: k-tss language models in a Speech recognition Systems. 
Computer, Speech and Language 15 (2001) 127-149 

9. Diaz, J., Rubio, A., Peinado, A., Segarra, E., Prieto, N., F.Casacuberta: Albayzin: 
a task-oriented Spanish Speech Corpus. In: First Int. Conf. on language resources 
and evaluation. Volume 11 . (1998) 497-501 

10. Rodriguez, L., Torres, L, Varona, A.: Evaluation of sublexical and lexical models 
of acoustic disfluencies for spontaneous Speech recognition in Spanish. In: Proc. of 
European Conference on Speech Technology. Volume 3. (2001) 1665-1668 



Selection of Lexical Units for Continuous Speech 
Recognition of Basque 



K. Lopez de Ipinal, M. Grana2, N. Ezeiza^, M. Hernandez2, E. Zuluetal, A. Ezeiza^, 

and C. Tovarl 



^Sistemen Ingeniaritza eta Automatika Saila Gasteiz. 

{isplopek, iepzugee}@vc. ehu.es 

^ Konputazio Zientziak eta Adimen Artifiziala Saila, Donostia. ccpgrrom®si . ehu . es 

^IXA group, Donostia. aitzol@si . ehu . es 
University of the Basque Country 



Abstract. The selection of appropriate Lexical Units (LUs) is an important 
issue in the development of Continuous Speech Recognition (CSR) systems. 
Words have been used classically as the recognition unit in most of them. 
However, proposals of non-word units are beginning to arise. Basque is an 
agglutinative language with some structure inside words, for which non-word 
morpheme like units could be an appropriate choice. In this work a statistical 
analysis of units obtained after morphological segmentation has been carried 
out. This analysis shows a potential gain of confusion rates in CSR systems, due 
to the growth of the set of acoustically similar and short morphemes. Thus, 
several proposals of Lexical Units are analysed to deal with the problem. 
Measures of Phonetic Perplexity and Speech Recognition rates have been 
computed using different sets of units and, based on these measures, a set of 
alternative non-word units have been selected. 

Keywords: Lexical Units, CSR, aglutinative languages. 



1 Introduction 

This paper presents an approach to the selection of Lexical Units (LUs) for 
Continuous Speech Recognition (CSR) of Basque. This language presents a wide 
dialectal distribution, being 8 the main dialectal variants. This dialectal diversity 
involves differences at phonetic, phonologic and morphological levels. Moreover, it is 
relevant the existence of the unified Basque, a standardisation of the language created 
with the aim of overcoming dialectal differences. Nowadays, a significant amount of 
speakers and most of mass media uses this standard. Thus, in this work the unified 
Basque is the main reference. 

The development of a CSR system for a language involves the selection of a set of 
suitable LUs. These LUs are used not only in Language Modelling, but also to define 
the dictionaries where the acoustic -phonetic models can be integrated. Classically, 
words have been used as LUs in most of the CSR systems. However, some recent 
proposals point out non- word units as alternative LUs for some languages. In fact for 
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languages whose words are not clearly delimited inside sentences such as Japanese 
[1], or with words with some structure within them such as Finish, German, Basque 
etc., these alternative units seem to be more accurate. There have been several 
proposals for alternative LUs, such as morphemes [1], automatically selected non- 
word units [2], etc. Thus, taking into account the morphological structure of Basque, 
the use of morphemes seems to be an appropriate approach. 



Table 1. Main characteristics of the textual databases 





STBASOUE 


NEWSPAPER 


BCNEWS 


Text amount 


1,6M 


1,3M 


2,5M 


Number of words 


197,589 


166,972 


210,221 


Number of pseudo-morphemes 


346,232 


304,767 


372,126 


Number of sentences 


15,384 


13,572 


19,230 


Vocabulary size in words 


50,121 


38,696 


58,085 


Vocabulary size in pseudo- 
morphemes 


20,117 


15,302 


23,983 



The following section describes the main morphological features of the language and 
details the statistical analysis of morphemes using three different textual samples. 
Section 3 presents the experiments and the evaluation criteria that have been used. 
Finally, conclusions are summarised in section 4. 



2 Morphological Features of Basque 

Basque is an aglutinative language with a special morpho-syntatic structure inside the 
words [3] [4] that may lead to intractable vocabularies of words for a CSR when the 
size of task is large. A first approach to the problem is to use morphemes instead of 
words in the system in order to define the system vocabulary [4]. This approach has 
been evaluated over three textual samples analysing both the coverage and the Out of 
Vocabulary rate, when we use words and pseudo-morphemes obtained by the 
automatic morphological segmentation tool AHOZATI [5]. Table 1 shows the main 
features of the three textual samples relating to size, number of words and pseudo- 
morphemes and vocabulary size, both in words and pseudo-morphemes for each 
database. The first important outcome of our analysis is that the vocabulary size of 
pseudo-morphemes is reduced about 60% (Fig. 1) in all cases relative to the 
vocabulary size of words. Regarding the unit size. Fig. 2 shows the plot of Relative 
Frequency of Occurrence (RFO) of the pseudo-morphemes and words versus their 
length in characters over the textual sample STDBASQUE. Although only 10% of the 
pseudo-morphemes in the vocabulary have less than 4 characters, such small 
morphemes have an Accumulated Frequency of about 40% in the databases [5] (the 
Acumulated Frequency is calculated as the sum of the individual pseudo-morphemes 
RFO). 

To check the validity of the unit inventory, units having less than 4 characters and 
having plosives at their boundaries were selected from the texts. They represent some 
25% of the total. This high number of small and acoustically difficult recognition 
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70000 -r 
60000 - 
50000 - 
40000 - 
30000 - 
20000 - 
10000 - 
0 




lEUSEST 

lEGUNKARIA 

lIRRATEL 




Vocabulary of words 



Vocabulary of pseudo- 
morpbemes 



Fig. 1. Vocabulary size of the words and pseudo-morphemes 




Fig. 2. Relative Frequency of Occurrence (RFO) of the words and pseudo-morphemes in 
relation to their length in characters (STDBASQUE sample) 



units could lead to an increase of the acoustic confusion, and could also generate a 
high number of insertions (Fig. 3 over the textual sample EGUNKARIA). 

Finally, Fig. 4 shows the analysis of coverage and Out of Vocabulary rate over the 
textual sample BCNEWS. When pseudo-morphemes are used, the coverage in texts is 
better and complete coverage is easily achieved. OOV rate is higher in this sample. 



3 Experimentation 

3.1 Description of the Tasks 

Appropriate tasks with controlled vocabularies are required to test EM and/or LUs. 
Two tasks have been created [4] for this purpose: 
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Fig. 3. Relative Frequency of Occurrence (RFO) of small and acoustically difficult recognition 
units (EGUNKARIA sample) 





(a) (b) 

Fig. 4. Coverage (a) and OOV rate (b) for the textual sample BCNEWS 



a) Miniature Language Acquisition (MLA) task is the language used by a computer 
system to give examples of pictures paired with true statements about those 
pictures. The task in Basque has 15,000 sentences with about 150,000 words, 
being 47 the vocabulary size. It has very low perplexity and very restrictive 
vocabulary size. It was created for preliminary experiments of CSR. 

b) Basic Vocabulary of Basque (BVB) is a task based on beginner’s level of Basque. 
The task consists of 5,000 sentences with about 30,000 words, being 3,500 the 
vocabulary size. Most of the features of the language described in section 2 are 
present in this task. It has a high perplexity comparing to MLA task and, it was 
created to measure the precision of the system when a larger scale task is used. 

Both tasks were automatic morphologically segmented into pseudo-morphemes by 
AHOZATI. The MLA task reduces its vocabulary size to 35 pseudo-morphemes and, 
BVB task to 1,900. Finally, a segmentation in N-WORDS was obtained resulting in. 
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40 and 2500 different vocabulary units for MLA and BVB tasks respectively. The 
sentences of MLA task were divided into 14,500 sentences for training and 500 for 
test and, the sentences of BVB task into 4,000 for training and 500 for test. 20 
speakers, 10 males and 10 females, recorded both tasks, obtaining 400 sentences for 
MLA and 800 sentences for BVB. In the speech recognition experiments a subset of 
BVB (MBVB) was used. The subset has a vocabulary size of 550 for WORDS, 400 
for PS-MORPHS and 500 for N-WORDS. 



3.2 Evaluation Criteria 



a) 



A perplexity function to evaluate the influence of the LUs in the LM. The classical 
perplexity function used to evaluate LMs might not be valid in this case. This 
function depends on the units used to compose sentences. Therefore, the 
evaluation must be based on an invariant unit, such as it is the phoneme. Thus, 
Phonetic Perplexity will be used to validate LUs. This perplexity is expressed as in 
[ 6 ]: 



PP = 



1 " 

^log2 ¥tob{W^\M) 




(1) 



Where PP is the Phonetic Perplexity function, P is the perplexity and F and K are 
the number of phonemes and units composing the sentences, respectively. The 
CMU-Cambrige Toolkit [6] has been used to calculate both PP and P for different 
N-gram lengths. 

b) Speech Recognition experiments without LM have been carried out to evaluate 
both the influence of acoustic confusion of LUs and the insertion of short LUs. 
Moreover, Recognition Rates for LUs (LURR) have been analysed using the raw 
stream of LUs (LURR-NA) and also the stream of words after the alignment of the 
non- words LUs (LURR- A) to words using simple information about the set of 
words. A set of 28 Contextual Independent Sublexical units modelled by Discrete 
HMMs with four codebooks will be used as acoustic models. 



c) The computational cost of the experiments is also tested. We evaluate the 
Computational Time (CT) (the performance in msecs. Real time operative 
corresponding to 10 msecs) and the Time Weighted LURR (T-LURR). 



3.3 Preliminary Experiment 

The previously analysed morphological features of the language make difficult the 
selection of appropriate LUs for CSR. Furthermore, evaluating the statistical 
measures of morphemes, it can be observed that the performance of the Acoustic 
Phonetic Decoding system could potentially be worse due to several factors. On the 
one hand, acoustically similar morphemes could lead to increase acoustic confusion. 
On the other hand, the amount of short units could also increase the amount of 
insertions [5]. 
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Table 2. Recognition rates (LURR) using the three sets of lexical units, WORDS, N-WORDS 
andPS-MORPHS 





I.tJRT?- 


UJRR- 


MLA 

T.tn^R- 


cr 


T 


r.URR- 


I.URR- 


MBVB 

rjiRR- 


CT 


T. 


WORDS 


NA 

80,61 


A 

80,61 


BtGR 

91,34 


6 


t.tntR 

13,4 


NA 

43,71 


A 

43,71 


BtGR 

48,44 


33 


t.tlRR 

1,46 


N-WORDS 


74, S4 


76, .to 


8S.82 


5 


14,96 


.30,09 


.32,60 


42,07 


28 


1,50 


PS-MORPHS 


60,29 


6.t,80 


8238 




20,09 


28,98 


29,09 


.^9,85 


2.6 


1,.69 



Three sets of LUs are used in the experiments [4]: 

1. WORDS: words are our baseline LU set. 

2. PS-MORPHS: these pseudo-morpheme units are morphemes automatically 
obtained and slightly transformed for Speech Recognition by ad-hoc rules [5]. 

3. N-WORDS: An alternative proposal. Pseudo-morphemes of length lower than 3 
characters with a high level of confusion are merged with adjacent units [5]. This 
proposal reduces the vocabulary size about 25% with respect to WORDS. 



3.4 Experimental Results 

Experiments with WORDS and PS-MORPHS sets were carried out to analyse the 
influence of the morphological structure in the recognition of the LUs. Measures of 
PP were computed for different values of N. Fig. 5 shows lower PP of WORDS with 
respect to PS-MORPHS in both tasks. The results of the speech recognition 
experiments also show better performance for WORDS than for PS-MORPHS in both 
tasks (table 2) This is due to the frequent confusion and the high amount of insertion 
in the case of the shortest pseudo-morphemes. Consequently, the alignment improves 
the results and reduces the insertion of short LUs. However, WORDS still obtained 
better results than PS-MORPHS. With regard to the CT and T-LURR the advantage is 
for PS-MORPHS. Regarding BVB task, it can be observed that the overall results are 
worst than in MLA (table 2), but it must be taken into account that the perplexity of 
the task is considerably higher [4]. The results show that PS-MORPHS has worst 
result of recognition but better results with regard to the computational cost. 
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The experiments using the new LUs N-WORDS show that PP is lower than the one 
for PS-MORPHS (Fig. 5) and closer to the perplexity measure for WORDS. Table 2 
indicates also that N-WORDS outperforms PS-MORPHS for MLA and MBVB tasks 
with or without alignment. Moreover the recognition rate of N-WORDS is closer to 
the rate for WORDS in both tasks. N-WORDS shows in table 2 the best balance of 
LURR and computational cost (CT and T-LURR). 

Finally, table 2 shows the performance of the system with a bigram Language Model. 
The introduction of a Language Model improves all the results, but the Increase in 
performance is more significant for non- word LUs. 



4 Concluding Remarks 

This work deals with the selection of appropriate LUs for Basque language. Since 
Basque is an agglutinative language, non- word units could be an adequate choice for 
LUs. First, morphemes and words have been tested, including a statistical analysis of 
morphemes in Basque. This analysis shows a large amount of short and acoustically 
similar morphemes, leading to a bad performance of the CSR system. Measures of 
phonetic perplexity, computational cost and speech recognition experiments have 
been completed to validate both proposals. Although word model obtains the best 
results, it becomes intractable for medium-large dictionaries. Thus, a new set of non- 
word units has been created based on morphemes. This proposal shows an appropriate 
performance of the system and reduces the problems raised by morphemes. In future 
works the obtained sets of LUs will be evaluated in a LVCSR system. 
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Abstract. In this paper we present the creation of a Mexican Span- 
ish version of the CMU Sphinx-III speech recognition system. We 
trained acoustic and N-gram language models with a phonetic set of 23 
phonemes. Our speech data for training and testing was collected from an 
auto-attendant system under telephone environments. We present exper- 
iments with different language models. Our best result scored an overall 
error rate of 6.32%. Using this version is now possible to develop speech 
applications for Spanish speaking communities. This version of the CMU 
Sphinx system is freely available for non-commercial use under request. 



1 Introduction 

Today, building a new robust Automatic Speech Recognition (ASR) system is a 
task of many years of effort. In the Autonomous University of Tlaxcala - Mexico, 
we have two goals in the ASR field: Do research for generating a robust speech 
recognizer, and build speech applications for automating services. In order to 
achieve our goals in a short time, we had to take a baseline work. We found 
that the CMU (Carnegie Mellon University) Sphinx speech recognition system 
is freely available and currently is one of the most robust speech recognizers in 
English. The CMU Sphinx system enables research groups with modest budgets 
to quickly begin conducting research and developing applications. This arrange- 
ment is particularly pertinent in Latin America, where the financial support and 
experience otherwise necessary to support such research is not readily available. 
In the past, few research efforts have been done for Spanish and these includes 
work from CMU in broadcast news transcription [1, 2], where basically acous- 
tic and language models have been trained. Our motivations for developing this 
work are due to the fact that many applications require a speech recognizer for 
Spanish, and because Spoken Dialogue Systems (SDS) require a robust speech 
recognizer were reconfiguration and retraining is necessary. 
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In this research, we have generated a lexicon and trained acoustic and lan- 
guage models with Mexican Spanish speech data for the CMU Sphinx speech 
recognition system. Our experiments are based on data collected from an auto- 
attendant application (CONMAT) deployed in Mexico [3], with a vocabulary of 
2,288 entries from names of people and places inside a university, including syn- 
onyms. Our speech data used for training and testing was filtered avoiding noisy 
utterances. Results are given in terms of the well known evaluation metric: Word 
Error Rate (WER). In the remainder of the paper we first provide an overview 
of the system in section 2. In section 3 we describe the components of the Sphinx 
system and how these were trained. In Section 4 we present experimental results. 
Finally, in section 5 we provide our conclusions and future directions. 



2 System Overview 

The Carnegie Mellon University Sphinx-III system is a frame-based, HMM- 
based, speaker-independent, continuous speech recognition system, capable of 
handling large vocabularies (see Fig. 1). The word modeling is performed based 
on subword units, in terms of which all the words in the dictionary are tran- 
scribed. Each subword unit considered in its immediate context (triphone) is 
modeled by 5-state left-to-right HMM model. Data is shared across states of 
different triphones. These groups of HMM states sharing distributions between 
its member states are called senones [4]. 




Signal 



raw audio 13 dimensional speeh feature 




Fig. 1. Architecture of the CMU Sphinx-III speech recognition system. The lexical 
or pronunciation model contains pronunciations for all the words of interest to the 
decoder. Acoustic models are based on statistical Hidden Markov models (HMMs). 
Sphinx-III uses a conventional backoff bigram or trigram language model. The result 
is a recognition hypothesis with a word lattice representing an N-best list. 
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The feature vector computation is a two-stage process. In the first stage, an 
off-line front-end module is first responsible for processing the raw audio sample 
stream into a cepstral stream. The input is windowed, resulting in frames of du- 
ration 25.625 ms. The output is a stream of 13-dimensional real-valued cepstrum 
vectors. The frames overlap, thus resulting in a rate of 100 vectors/sec. In the 
second stage, the stream of cepstrum vectors is converted into a feature stream. 
This process consists of a Cepstrum Mean-Normalization (CMN) and Automatic 
Gain Control (ACC) step. The final speech feature vector is created by typically 
augmenting the cepstrum vector (after CMN and ACC) with one or more time 
derivatives. The feature vector in each frame is computed by concatenating first 
and second derivatives to the cepstrum vector, giving a 39-dimensional vector. 

3 System Components 

3.1 Lexicon 

The lexicon development process consisted of defining a phonetic set and gener- 
ating the word pronunciations for training acoustic and language models. 



Table 1. ASCII Phonetic Symbols for Mexican Spanish. 



Manner 


Label 


Example 


Worldbet Word 


Plosives 


P 


punto 


punto 




b 


banos 


banos 




t 


tino 


tino 




d 


donde 


donde 




k 


casa 


k a s a 




g 


ganga 


ganga 


Fricatives 


f 


falda 


falda 




s 


mismo 


mismo 




X 


jamas 


X a m a s 


Affricates 


ts 


chato 


tS a t o 


Nasals 


m 


mano 


mano 




n 


nada 


nada 




fi 


bano 


bano 


Semivowels 


1 


lado 


lado 




L 


polio 


p o L 0 




r( 


pero 


p e r( o 




r 


perro 


pero 




w 


hueso 


w e s 0 


Vowels 


i 


piso 


piso 




e 


mesa 


mesa 




a 


caso 


k a s 0 




0 


modo 


modo 




u 


cura 


k u r( a 
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Our approach for modeling Mexican Spanish phonetic sounds in the CMU 
Sphinx-III speech recognition system consisted of an adapted version from the 
WORLDBET Castilian Spanish phonetic set [5], which resulted in 23 phonemes 
listed in Table 1. The adaptation consisted in a manual comparison of spec- 
trograms from words including a common phoneme; we found common sounds 
which we merged in our final list of phonemes. The following are the modifica- 
tions made to the Castilian Spanish sounds set for generating a Mexican Spanish 
version: 

— Fricative /s/ as in “kasa” and fricative /z/ as in “mizmo” merged into /s/, 

— Plosive /b/ as in “bahos” and fricative /V/ as in “aVa” merged into /b/, 

— Plosive /d/ as in “donde” and fricative /D/ as in “deDo” merged into /d/, 

— Plosive /g/ as in “ganga” and fricative /G/ as in “lago” merged into /g/, 

— Semi- vowels /j/ as in “majo” and /L/ as in “poLo”, and affricate /dZ/ as 
in “dZugo” merged into /L/, 

— Nasal /n/ as in “nada” and nasal /N/ as in “baNko” merged into /n/, 

— Fricative /T / as in “luTes” was deleted due to the fact that this sound does 
not exist in Mexican Spanish. 

The vocabulary size has 2,288 words, which is based on names of people 
and places inside a university, including synonyms. The automatic generation 
of pronunciations was performed using a simple list of rules and exceptions. 
The rules determine the mapping of clusters of letters into phonemes and the 
exceptions list covers some words with irregular pronunciations. A Finite State 
Machine (FSM) was used to develop the pronunciations from the word list. 

3.2 Acoustic Models 

For training acoustic models is necessary a set of feature files computed from the 
audio training data, one each for every recording in the training corpus. Each 
recording is transformed into a sequence of feature vectors consisting of the 
Mel-Frequency Cepstral Coefficients (MFCCs). The training of acoustic models 
is based on utterances without noise. This training was performed using 3,375 
utterances of speech data from an auto-attendant system, which context is names 
of people and places inside a university. 

The training process (see Fig. 2) consists of the following steps: Obtain a cor- 
pus of training data and for each utterance, convert the audio data to a stream 
of feature vectors, convert the text into a sequence of linear triphone HMMs 
using the pronunciation lexicon, and find the best state sequence or state align- 
ment through the sentence HMM for the corresponding feature vector sequence. 
For each senone, gather all the frames in the training corpus that mapped to 
that senone in the above step and build a suitable statistical model for the 
corresponding collection of feature vectors. The circularity in this training pro- 
cess is resolved using the iterative Baum-Welch or forward-backward training 
algorithm. Due to the fact that continuous density acoustic models are com- 
putationally expensive, a model is built by sub-vector quantizing the acoustic 
model densities (sub- vector quantizing was turned off in our work). 
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Fig. 2. A block schematic diagram for training acoustic models. 



3.3 Language Models 

The main Language Model (LM) used by the Sphinx decoder is a conventional 
bigram or trigram backoff language model. Our LMs were constructed from 
the 2,288 word dictionary using the CMU-Cambridge statistical language model 
toolkit version 2.0 [6], see Fig. 3. The training data consisted of 3,375 transcribed 
utterances of speech data from an auto-attendant system. We trained bigrams 
and trigrams with four discounting strategies: Good Turing, Absolute, Linear, 
and Witten Bell. The LM probability of an entire sentence is the product of the 
individual word probabilities. The output from the CMU-Cambridge toolkit is 
an ASCII text file, and because this file can be very slow to load into memory, 
the LM must be compiled into a binary form. The decoder uses a disk-based 
LM strategy to read the binary into memory. Although the CMU-sphinx recog- 
nizer is capable for handling out-of-vocabulary speech, we did not set any filler 
models. Finally, the recognizer needs to exponenciate the LM probability using 
a language weight before combining the result with the acoustical likelihood. 




Fig. 3. A block schematic diagram for training language models. 
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4 Experimental Results 

4.1 Experimental Setup 

We performed two experiments for evaluating the performance of the CMU 
Sphinx system trained with Mexican speech data (872 utterances) in the con- 
text of an auto-attendant application: the first experiment considered names of 
people and places as independent words (i.e. any combination of first names and 
last names was allowed), the second experiment considered names of people and 
places as only one word. Each experiment was evaluated with two different LMs. 

4.2 Evaluation Criteria 

The evaluation of each experiment was made according to recognition accuracy 
and computed using the WER (Word Error Rate) metric defined by the equation 
1, which align a recognized word string against the correct word string and 
compute the number of substitutions (S), deletions (D), and insertions (I) from 
the number of words in the correct sentence (N). 

WER={S + D + I) /N*IQQ%. (1) 



4.3 Results 

Recognition results for each decoding stage for the CMU with Sphinx Mexican 
Spanish test data are shown in Tables 2 and 3. In table 2 (experiment 1), we 
can observe that the use of Good Turing discount strategy is not convenient, 
and the use of different n-grams does not make much difference, perhaps bigger 
training and test sets would yield significant differences. In the mean time, for 
this experiment the best option is bigrams with Witten Bell discounting strategy, 
but we observed problems with this approach due that this experiment can yield 
incorrect hypothesis, i.e. inexistent names of people and places. Thus, another 
solution was necessary to solve this problem. In table 3 (experiment 2), we 
observe that due to the conditions of the experiment, would yield no further 
significant improvements with different n-grams. Despite of this, the best gains 
are shown in trigrams with Witten Bell discounting strategy. 



Table 2. Word error rate in the test set after decoding from the experiment 1, which 
considered names of people and places as independent words. 



Discounting Strategy 


Bigrams 


Trigrams 


Good Turing 


12.95 


12.88 


Absolute 


7.82 


7.63 


Linear 


7.94 


8.07 


Witten Bell 


7.63 


7.75 
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Table 3. Word error rate in the test set after decoding from the experiment 2, which 
considered names of people and places as only one word. 



Discounting Strategy 


Bigrams 


Trigrams 


Good Turing 


6.88 


6.44 


Absolute 


6.38 


6.38 


Linear 


6.50 


6.57 


Witten Bell 


6.38 


6.32 



5 Conclusions and Future Work 

We described the training and evaluation processes of the CMU Sphinx-III 
speech recognition system for Mexican Spanish. We performed two experiments 
in which we grouped differently the word dictionary entries. Our best results 
of this development considered dictionary entries as only one word for avoid- 
ing inexistent names of people and places inside a university. Through a simple 
lexicon and set of acoustic and language models, we demonstrated an accurate 
recognizer which scored an overall error rate of 6.32% on in-vocabulary speech 
data. We achieved the goal of this work from which now we have a baseline 
product for performing research in speech recognition, which is an important 
component of spoken language systems. Also, with this work we can start de- 
velopment of speech applications with the advantage that we can retrain and 
adapt the recognizer according to our needs. This work was motivated due to 
the fact that people around the world needs to develop applications involving 
speech recognition for Spanish speaking communities. Therefore, the resulted 
lexicon, acoustic and language models are freely available for non-commercial 
purposes under request. 

An immediate future work is to provide a bridge for invoking the recog- 
nizer and see it as a black box, perhaps we can build a dll file or we can pro- 
vide something similar as SAPI. This is indispensable for programmers who 
need to develop speech applications from different programming environments. 
Another important future direction and due that this development considers 
only in- vocabulary speech, we plan to retrain the recognizer considering Out- 
Of- Vocabulary (OOV) speech, measuring computational overhead. This is due 
to the fact that OOV speech is an important factor in spoken dialogue systems 
and degrades significantly the performance in such systems [7]. Also, we plan to 
train Sphinx in different domains, as well as optimize configuration parameters. 
Finally, we plan to train Sphinx release 4 which was implemented in Java, and 
make a comparison between Sphinx III and Sphinx 4 in Spanish domains. All 
this work would be performed considering a bigger corpus. 
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Abstract. This paper presents a new methodology, based on the classical 
decision trees, to get a suitable set of context dependent sublexical units for 
Basque Continuous Speech Recognition (CSR). The original method proposed 
by Bahl [1] was applied as the benchmark. Then two new features were added: a 
data massaging to emphasise the data and a fast and efficient Growing and 
Pruning algorithm for DT construction. In addition, the use of the new context 
dependent units to build word models was addressed. The benchmark Bahl 
approach gave recognition rates clearly outperforming those of context 
independent phone-like units. Finally the new methodology improves over the 
benchmark DT approach. 

Keywords: Sublexical Units, Decision Trees, Growing and Pruning Algorithm 



1 Introduction 



The choice of a suitable set of sublexical units is one of the most important issues in 
the development of a Continuous Speech Recognition (CSR) system. As shown in the 
literature, authors have proposed a wide range of them: diphones, triphones and other 
context dependent units, transitional units or demiphones. Such a variety of 
approaches aim at the accurate model of the influence of contexts in the realisation of 
Phone Like Context Independent Sublexical Units (PL-CI-SLUs). System efficiency 
can exploit the benefits of context modelling by using context dependent sublexical 
units to generate lexical baseforms, taking into account not only intraword but also 
between-word contexts, as we will see. 

Decision Trees (DT) are one of the most common approaches to the problem of 
selecting a suitable set of context dependent sublexical units (DT-CD-SLUs) for 
speech recognition [1][2][3]. 
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DT combine the advantages of applying some phonetic knowledge about how 
contexts affect the articulation of speech and a strictly quantitative validation 
procedure based on the likelihood of speech samples with regard to some probabilistic 
models. 

In this work DT have been used to model both intraword and between-word 
context dependencies. Starting from the classical scheme [1], some attempts have 
been made in order to improve the accuracy and the discriminative power of the 
models. An alternative methodology, the fast and efficient Growing and Pruning 
algorithm [4], has also been applied to build the decision trees. 

The paper is organised as follows. Section 2 reviews the basic DT 
methodology, describing more carefully those points where major changes have been 
introduced. Section 3 presents the alternative DT methodology, based on the Growing 
and Pruning algorithm and on the data massaging. In section 4, the issue of between- 
word context modelling is discussed and some solutions are proposed. Finally, in 
Section 5 DT-based Context Dependent and Semicontextual Units (DT-CD-SLUs and 
DT-SC-SLUs) are applied to a Basque GSR task, and experimental results are 
discussed. Conclusions are summarised in section 6. 



2 The Baseline Methodology 



Firstly, automatic segmentation of the training corpus was carried out to get the set of 
samples corresponding to each of the PL-CI-PLUs, each sample consisting of a string 
of labels, obtained by vector quantization of the acoustic observation vectors. In fact, 
four different strings of labels were used simultaneously, each corresponding to a 
different acoustic observation VQ codebook. Each DT, associated to a given PL-CI- 
SLUs, was built as follows. All the samples corresponding to that PL-CI-SLUs were 
assigned to the root node. 

Then a set of binary questions, manually established by an expert phonetician, 
related to one or more left and right contexts, were made to classify the samples. Any 
given question Q divided the set of samples Y into two subsets, Y, and Y^. The resulting 
subsets were evaluated according to a quality measure, a Goodness of Split (GOS) 
function, reflecting how much the likelihood of the samples increased with the split. 
Heuristic thresholds were applied to discard those questions yielding low likelihoods 
(GOS threshold) or unbalanced splits (trainability threshold). Among the remaining 
questions, the one giving the highest quality was chosen, thus appearing two new -left 
and right- nodes, being the samples partitioned according to the answer {YES/NO) to 
that question. This procedure was iterated until no question exceeded the quality 
thresholds. 

Following the classical scheme, a simple histogram was used to model acoustic 
events, each component of the histogram being modelled as a Poisson distribution. In 
fact, the model consisted of four different histograms, whose likelihoods were 
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multiplied to yield the combined likelihood. To evaluate the quality of the splits the 
classical GOS function was applied: 



where T, and T stand for the sets of samples resulting of the split of set Y that were 
used to train models M,, and M respectively; P(Y\M) is the joint likelihood of a set 
of samples Y with regard to a previously trained model M. This GOS function 
measures the likelihood improvement resulting from the split -i.e. from the question 

e- 

3 Methodological Improvements 

As said above, DTs were grown until any of the stopping criteria verified. Two 
thresholds were used, the first one establishing a minimum GOS value, the second one 
giving the minimum number of training samples. After some preliminary 
experimentation, adequate values were heuristically fixed for these thresholds. This is 
a very simple but inconvenient way to stop the growing procedure, because thresholds 
must be fixed for each training database. 

An alternative methodology was designed to overcome this problem, based on the fast 
and efficient Growing and Pruning (G&P) algorithm [4]. 

The G&P algorithm divides the set of training samples corresponding to a given PL- 
CTSLUs into two independent subsets. The tree is iteratively grown with one of the 
subsets, and pruned with the other, interchanging the roles of the two subsets in 
successive iterations. The growing procedure was identical to that described in section 
2, but removing the GOS threshold. A minimum number of training samples was 
required for a node to be valid. As a second step, once a big DT was built, the pruning 
procedure applied a misclassification measure to discard leaf nodes below a given 
threshold. It can be shown that the algorithm converges after a few steps [4]. Among 
the DT building methods, G&P provides a good balance between classification 
accuracy and computational cost, compared to other methods like CART [5]. Note, 
however, that we use an alternative to the classic G&P. A new threshold must be still 
heuristically fixed to control the size of the sample sets associated to the leaf nodes, 
because a minimum number of samples is necessary for the acoustic models to be 
trainable. 

Preprocessing the data (data massaging) may improve the performance of DT 
when databases are small. In this work we have computed the square to each 
histogram element to emphasise it, obtaining a better discrimination. 
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Table 1. Recognition rates for various methodologies of selection of the sets of sublexical units 
in a speaker independent acoustic-phonetic decoding task in Basque 



Type of units 


Context 

window 

size 


G&P 


Preprocessing 


#Units 


% REC 


CI-PLU 


- 


- 


- 


28 


64.01 


DT-stdl 


1 


- 


Standard 


256 


71.10 


DT-std2 


1 


- 


Standard 


217 


71.45 


DT-g&pl 


1 


G&P 


Standard 


220 


71.32 


DT-g&p2 


2 


G&P 


Standard 


234 


70.99 


DT-g&p-mass 


1 


G&P 


Data-massaging 


215 


71.52 



4 The Word Models 



The construction of word models can take a great advantage of the DT-CD-SLUs. In 
the linear lexicon framework applied in this work, a more consistent word model 
results from the concatenation of this kind of units. Intraword contexts are handled in 
a straightforward manner, because left and right contexts are known and DT-CD- 
SLUs guarantee a full coverage of such contexts. A challenging problem arises when 
considering between-word contexts, i.e. the definition of border units, because outer 
contexts are not known, and a lack of coverage is found for these situations. Which 
contexts should be considered outside the edges of words? A brute force approach 
would expand these border units with all the context dependent units fitting the inner 
context. This leads to an intractable combinatorial problem when dealing with a large 
search automaton. Usually, this problem is solved either by simply using context 
independent units, or by explicitly training border units [1] [2] [3] [6]. 

Two different approaches to represent inter word context dependencies were 
considered and tested in this work. DT-CD-SLUs introduced in previous section were 
used inside the words in any case. 

a) PL-CI-SLUs were used at word boundaries. As mentioned above, this 
approach involves a low computational cost but does not consider many acoustic 
influences of neighbouring phones. 

b) Decision Tree based Semicontextual sublexical (DT-SC-SLUs) units. 
Specific decision tree-based context dependent units were used at word boundaries 
[7]. These sets of units were specifically obtained to be insideword context dependent 
and outsideword context independent. These units were obtained using binary 
questions about either the left context or the right context. This set was used to 
transcribe the last phone of each word. This procedure agrees with the classical 
decision tree methodology used to get context dependent units. Thus, full coverage of 
inner contexts is guaranteed while keeping outside context independence. On the other 
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hand, the size of the lexicon as well as the computational cost of the search did not 
increase. 

5 Experimental Evaluation 



The corpus used to obtain all the DT-CD-SLUs previously presented was composed of 
10000 sentences, phonetically balanced and uttered by 40 speakers, involving around 
200000 phones. These samples were then used to train the acoustic model of each DT- 
derived context dependent unit. Discrete HMMs with four observation codebooks 
were used as acoustic models in these experiments. 

A task has been created for this purpose. The Miniature Language Acquisition (MLA) 
[8] in Basque has 15,000 sentences with about 150,000 words, being 47 the 
vocabulary size. It has very low perplexity and very restrictive vocabulary size. It was 
created for preliminary experiments of CSR. Then, the task underwent an automatic 
morphological segmentation and we created two sets of lexical units as alternative to 
the words. We considered these new lexical units because Basque is an aglutinative 
language [9]. Thus, MLA task reduces the vocabulary size to 35 pseudo-morphemes 
(PS-MORPHS). Finally, N-WORDS acoustically more robust units [9] were obtained 
resulting in, 40. The sentences of MLA task were divided into 14,500 sentences for 
training and 500 for test. 20 speakers, 10 males and 10 females, recorded the task, 
obtaining 400 sentences. 

5.1 Acoustic-Phonetic Decoding Experiments 

Two groups of sublexical units were used in these experiments: 

The first and simplest one consisted of 24 PL-CI-SLU and it was used as a 
reference set. 

The second group of sublexical units was the DT-CD-SLUs set obtained 
trough the methodology described in Section 2. Both the standard approach -with and 
without the new features described above- and the G&P approach, were used to 
generate the corresponding DT-CD-SLUs. The standard approach, using a set of 
phonetic questions about left and one right contexts and two different thresholds 
controlling the size of the training sets, was applied to get the sets DT-stdN. The 
standard approach, but replacing the standard data by the massaging data defined in 
section 3, was used to obtain the set DT-mass. Finally, the G&P approach was applied 
to obtain the sets DT-g&p. Results are shown in table 1. 

From these results we conclude that DT-CD-SLUs outperform the reference 
sets CI-PLU and Freq-CDU. The two new features added to the standard DT 
methodology improve the performance. In fact, the best result (71.52%) obtained for 
DT-g&p-mass, integrate two methods only slightly better than the obtained for DT- 
stdl (71.45%) but improve the result obtained for G&P in [7] for Spanish. The G&P 
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methodology performed faster than the standard. Most times the procedure did 
converge in two steps, each step involving half the samples of the standard 
methodology, thus providing considerable timesavings. 

Table 2. Word recognition rates in a Basque CSR task (MLA), without language model, for 
various sets of sublexical units and three different approaches to the definition of border units 
hy using three different sets of lexical units: WORDS, PS-MORPHS and N-WORDS 



units used at word boundaries 



WORDS 


PS-MORPHS 


N-WORDS 




PL-CI- 


DT-SC- 


PL-CI- 


DT-SC- 


PL-CI- 


DT-SC- 




SLU 


SLU 


SLU 


SLU 


SLU 


SLU 


CI-PLU 


80.61 


- 


- 


- 




- 


DT-stdl 


86.73 


87.68 


84.03 


83.68 


69.20 


72.13 


DT-g&p2 


86.20 


87.41 


83.12 


83.44 


69.54 


72.52 


DT-g&p-mass 


86.43 


90.75 


83.96 


84.19 


69.68 


73.33 



5.2 Lexical Unit-Level Experiments 

This second series of experiments was aimed to evaluate the proposed DT-CD-SLUs 
when used to build word models. Different lexicon transcriptions were applied 
according to the approach used to model word boundaries (section 4), while keeping 
DT-CD-SLUs inside words: PL-CD-SLUs and DT-SC-SLUs. 

The experiments have been carried out without grammar an in the case of 
morphemes and N-WORDS the output was aligned to words to compare the results 
appropriately. Experimental results are shown in table 2. DT-CD-SLUs outperformed 
the reference sets PL-CI-SLUs in all cases. As expected, the use of DT-SD-SLUs at 
word boundaries led to the best results, establishing an upper bound to the benefits 
attainable by using context dependent sublexical units to build word models. This 
reveals the contribution of modelling between-word context to the speech recognition, 
and suggests further work in that line. 

DT-g&p-mass gave the best recognition rates, being the best choice when 
handling isolated lexical units both with PL-CI-SLUs (86,43% for words, 83,96 for N- 
WORDS and 69,68 for PS-MORPHS) or DT-SC-SLUs SLUs (90,75% for words, 
84,19 for N-WORDS and 73,33 for PS-MORPHS). Finally, the G&P methodology 
has a performance similar to standard methodology, with a very low computational 
cost. 
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6 Concluding Remarks 



The classical decision tree classification methodology was improved to obtain a 
suitable set of context dependent sublexical units for Basque CSR tasks. A data 
massing methodology was used to emphasising differences among the samples. An 
alternative methodology, based on the fast and efficient G&P algorithm, was also 
proposed. Various sets of DT-based context dependent sublexical units were tested in 
a first series of speaker independent acoustic -phonetic decoding experiments, where 
our methodology outperforms the classical one proposed by Bahl. Two different 
strategies to handle border units in the construction of word models were described 
and tested in a second series of experiments. Results showed the potential contribution 
of modelling between-word contexts to speech recognition, and suggest further work 
in that line. 
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Abstract. Glassification problems are traditionally focused on uniclass 
samples, that is, each sample of the training and test sets has one unique 
label, which is the target of the classification. In many real life applica- 
tions, however, this is only a rough simplification and one must consider 
some techniques for the more general multiclass classification problem, 
where each sample can have more than one label, as it happens in our 
task. In the understanding module of a domain-specific dialogue system 
for answering telephone queries about train information in Spanish which 
we are developing, a user turn can belong to more than one type of frame. 
In this paper, we discuss general approaches to the multiclass classifi- 
cation problem and show how these techniques can be applied by using 
connectionist classifiers. Experimentation with the data of the dialogue 
system shows the inherent difficulty of the problem and the effectiveness 
of the different methods are compared. 



1 Introduction 

In many real pattern recognition tasks, it is convenient to perform a previous 
classification of the objects in order to treat them in a specific way. For instance, 
if language models can be learnt for specific sub-domains of a task, better per- 
formance can be achieved in an automatic speech recognition/understanding 
system. The aim of this work is to propose some classification techniques in 
order to improve the understanding process of a dialogue system. 

The task of our dialogue system consists of answering telephone queries about 
train timetables, prices and services for long distance trains in Spanish. The 
understanding module gets the output of the speech recognizer (sequences of 
words) as input and supplies its output to the dialogue manager. The semantic 
representation is strongly related to the dialogue management. In our approach, 
the dialogue behavior is represented by means of a stochastic network of dialogue 
acts. Each dialogue act has three levels of information: the first level represents 
the general purpose of the turn, the second level represents the type of semantic 

* This work has been partially supported by the Spanish GIGYT under contracts 
TIG2000-0664-G02-01 and TIG2002-04103-G03-03. 
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message (the frame or frames), and the third level takes into account the data 
supplied in the turn. 

We focus our attention on the process of classification the user turn in terms 
of the second level of the dialogue act, that is, the identification of the frame 
or frames. This classification will help us to determine the data supplied in the 
sentence in a later process, where depending on the output of the classifier, one 
or more specific understanding models are applied. Our previous work on this 
same topic can be found in [1,2]. 

Dealing with this frame detection problem, we encountered the problem of 
the multiclass classification as a natural issue in our system. A user can ask in the 
same utterance about timetables and prices of a train, for example, and these 
are two of the categories we have defined. This poses an interesting problem, 
as most of the classification problems and solutions up to now have focused 
exclusively on the uniclass classification problem, and few have dealt with this 
kind of generalization. 



2 The Uniclass Classification Problem 

Uniclass classification problems involve finding a definition for an unknown func- 
tion k*{x) whose range is a discrete set containing jCj values (i.e., jCj “classes” 
of the set of classes C = . . . , The definition is acquired by 

studying collections of training samples of the form 

{(Xn, Cn)}ji=l I Cn S C , (1) 

where x„ is the n-th sample and c„ is its corresponding class label. 

For example, in handwritten digit recognition, the function k* maps each 
handwritten digit to one of jCj = 10 classes. The Bayes decision rule for mini- 
mizing the probability of error is to assign the class with maximum a posteriori 
probability to the sample x: 

fc*(x) = argmaxPr(A:|x) . (2) 

kec 

Uniclass Classification using Neural Networks. Multilayer perceptrons 
(MLPs) are the most common artificial neural networks used for classification. 
For this purpose, the number of output units is defined as the number of classes, 
jCj, and the input layer must hold the input samples. Each unit in the (first) 
hidden layer forms a hyperplane in the pattern space; boundaries between classes 
can be approximated by hyperplanes. If a sigmoid activation function is used, 
MLPs can form smooth decision boundaries which are suitable to perform clas- 
sification tasks [3]. 

For uniclass samples, the activation level of an output unit can be interpreted 
as an approximation of the a posteriori probability that the input sample be- 
longs to the corresponding class. Therefore, given an input sample x, the trained 
MLP computes gk{x,uj) (the fc-th output of the MLP with parameters w given 
the input sample x) which is an approximation of the a posteriori probability 
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Pr(fc|x). Thus, for MLP classifiers we can use the uniclass classification rule as 
in equation (2): 

fc*(x) = argmaxPr(fc|x) w argmaxgfc(x,a;) . (3) 

fceC feeC 



3 The Multiclass Classification Problem 

In contrast to the uniclass classification problem, in other real-world learning 
tasks the unknown function k* can take more than one value from the set of 
classes C. For example, in many important document classification tasks, docu- 
ments may each be associated with multiple class labels [4, 5] . A similar example 
is found in our classification problem of dialogue acts: a user turn can be labeled 
with more than one frame label. In this case, the training set is composed of 
pairs of the form^ 

{(x„,C„)}li, C„CC. (4) 

There are two common approaches to this problem of classification of objects 
associated with multiple class labels.^ The first is to use specialized solutions like 
the accumulated posterior probability approach described in the next section. 
The second is to build a binary classifier for each class as explained afterwards. 

3.1 Accumulated Posterior Probability 

In a traditional (uniclass) classification system, given an estimation of the a 
posteriori probabilities Pr(fc|x), we can think of a classification as “better esti- 
mated” if the probability of the destination class is above some threshold (i.e., 
the classification of a sample x as belonging to class k is better estimated if 
Pr(fc|x) = 0.9 than if it is only 0.4). A generalization of this principle can be 
applied to the multiclass approximation problem. 

We can consider that we have correctly classified a sample only if the sum 
of the a posteriori probabilities of the assigned classes is above some thresh- 
old T. Let us define this concept more formally. Suppose we have an ordering 
(permutation) {k^^\ k^'^\ . . . , of the set C for a sample x, such that 

Pr(/fcW|x) > Vl<i<|C|. (5) 

We define the “accumulated posterior probability” for the sample x as 

3 

Pl-x(j) = E Pr(fc(*)|x) l<j<\C\. (6) 

i=l 

® The uniclass classification problem is a special case in which \Cn\ = 1 for all samples. 
In certain practical situations, the amount of possible multiclass labels is limited due 
to the nature of the task. For instance, if we know that the only possible appearing 
multiple labels can be and we do not need to consider all the 

possible combinations of the initial labels. In such situations we can handle this task 
as an uniclass classification problem with the extended set of labels C defined as a 
subset of "P(C). 
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Using the above equation, we classify the sample x in n classes, being n the 
smallest number such that 

Pix(n)>r, (7) 

where the threshold T must also be learnt automatically in the training process. 
The set of classification labels for the sample x is simply 

iCfyx) = {fcW,...,fc(")}. (8) 

Accumulated Probability using MLPs. We can apply this approach using 
neural networks by modifying slightly equation (7). As the output of the output 
layer is an approximation of the a posteriori probabilities, it is possible that the 
sum exceeds the value of 1, so a more suitable estimation would be® 

|l-Pix(n)| <5, (9) 

where the accumulated posterior probabilities Prx(j) are computed as in equa- 

tion (6) by approximating the posterior probabilities with an MLP of \C \ outputs 

j j 

Prx(j) = l<j<|C|. (10) 

i=l i=l 

The outputs gi{x,uj) of the trained MLP are also ordered according (5). During 
the training phase, the desired outputs for the sample x are the “true” posterior 
probabilities of each class.® 



3.2 Binary Classifiers 



Another possibility is to treat each class as a separate binary classification prob- 
lem (as in [6-8]). Each such problem answers the question, whether a sample 
should be assigned to a particular class or not. 

For C CC, let us define C[c] for c G C to be: 



C[c] 



true, if c G C ; 
false, if c ^ C . 



( 11 ) 



A natural reduction of the multiclass classification problem is to map each 
multiclass sample (x, C) to jCj binary-labeled samples of the form ((x,c),C[cj) 
for all c G C; that is, each sample is formally a pair, (x,c), and the associated 
binary label, C[c]. In other words, we can think of each observed class set C as 
specifying jCj binary labels (depending on whether a class c is or not included 
in C), and we can then apply uniclass classification to this new problem. For 



® Note the different interpretation of the threshold value in equations (7) and (9). 
In the first one, T represents the probability mass that we must have for correctly 
classifying a sample, whereas in the second one 5 is a measure of the distance to the 
“ideal” classification with a posteriori probability value of 1. 

® Nevertheless, a simplification is assumed: as the true posterior probabilities usually 
cannot be known, we consider all the classes of a training sample equally probable. 
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instance, if a given training pair (x, C) is labeled with the classes and 
(x, then \C\ binary-labeled samples are defined as ((x, true), 

((x,c^^)),true) and ((x, c), false) for the rest of classes c G C. 

Then a set of binary classifiers is trained, one for each class. The fth classifier 
is trained to discriminate between the zth class and the rest of the classes and 
the resulting classification rule is 

/C*(x) = {fcGC|Pr(fc|x)>r}, (12) 

being T a threshold which must also be learnt. 

Binary Classification Using MLPs.Let (wi, . . . , W|c|) be the MLP classifiers 
trained as in the uniclass case. Furthermore, let g(x,Wi) be the output of the 
ith MLP classifier when given an input sample x. New samples are classified by 
setting the predicted class or classes to be the index of the classifiers attaining 
the highest posterior probability, 

iL*(x) = {fc G C I Pr(A:|x) > T} « {fc G C | 5 (x,Wfc) > T} . (13) 

An alternative approach is to assign a binary string of length \C \ to each class 
c G C or set of classes C C C. During training for a pattern from classes and 
for example, the desired outputs of these binary functions are specified by 
the corresponding units for classes i and j. With MLPs, these binary functions 
can be implemented by the \C\ output units of a single network. 

In this case, the multiclass classification rule is redefined as: an input sample 
X can be classified in the classes AT*(x) with a posteriori probability above a 
threshold T : 

iF*(x) = {fc G C I Pr(A:|x) > T} « {fc G C | 5 fc(x,u;) > T} , (14) 

being gfe(x, co) the A:-th output of an MLP classifier with parameters lo given the 
input sample x. 

4 The Dialogue Task 

The final objective of our dialogue system is to build a prototype for information 
retrieval by telephone for Spanish nation-wide trains [9] . Queries are restricted to 
timetables, prices and services for long distance trains. A total of 215 dialogues 
were acquired using the Wizard of Oz technique. From these dialogues, a total 
of 1440 user turns (14 923 words with a lexicon of 637 words) were obtained. 
The average length of a user turn is 10.27 words. All the utterances we used for 
our experiments were transcribed by humans from the actual spoken responses. 

The turns of the dialogue were labelled in terms of three levels [10]. An 
example is given in Figure 1. We focus our attention on the most frequent sec- 
ond level labels, which are Affirmation, Departure_time, New_data, Price, Closing, Re- 
turn_departure_time. Rejection, ArrivaLtime, Train_type, Confirmation. Note that each 
user turn can be labeled with more than one frame label^ (as in the example). 

^ In related works of dialogue act classification [11], a hand-segmentation of the user 
turns was needed in order to have sentence-level units (utterances) which corre- 



Uniclass and Multiclass Connectionist Classification of Dialogue Acts 27 1 



Original sentence Hello, good morning. I would like to know the price and 
timetables of a train from Barcelona to La Coruna for the 
22nd of December, please. 

1st level (speech act) Question 

2nd level (frames) Price, Departure_time 

3rd level (cases) Price (Origin: barcelona, Destination: la_coruna, Depar- 

ture_time: 12/22/2003) 

Departure_time (Origin: barcelona. Destination: lamoruna, De- 
parture_time: 12/22/2003) 

Fig. 1. Example of the three- level labeling for a multiclass user turn. Only the English 
translation of the original sentence is given. 

For classification and understanding purposes, we are concerned with the 
semantics of the words present in the user turn of a dialogue, but not with the 
morphological forms of the words themselves. Thus, in order to reduce the size 
of the input lexicon, we decided to use categories and lemmas. In this way, we 
reduced the size of the lexicon from 637 to 311 words. Then, we discarded those 
words with a frequency lower than five, obtaining a lexicon of 120 words. 

We think that for this task the sequential structure of the sentence is not 
fundamental to classifying the type of frame.® For that reason, the words of the 
preprocessed sentence were all encoded with a local coding: a 120-dimensional 
bit-vector, one position for each word of the lexicon. When the word appears in 
the sentence, its corresponding unit is set to 1, otherwise, its unit is set to 0. 

4.1 Codification of the Frame Classes 

For the uniclass problem we used the usual “l-of-|C|” coding, the desired output 
for each training sample is set to 1 for the one frame class that is correct and 
0 for the remainder. The codification in the multiclass problem is different for 
each approach: 

Binary classification with \C\ MLPs. The target of the training sample is 1 
if the sample belongs to the class of the MLP classifier, and 0 if not. 
Binary classification with one MLP. The target of the training sample is 
coded with a |C|-dimensional vector: the desired outputs for each training 
sample {xn,Cn) are set to 1 for those (one or more) frame classes that are 
correct and 0 for the remainder. 

Accumulated posterior probability. The target of the training sample is 
coded with a |C|-dimensional vector: the desired outputs for each training 
sample (x„,Cn) are set to 1/|C„| for those (one or more) frame classes that 
are correct and 0 for the remainder. 

sponded to a unique dialogue act. The relation between user turns and utterances 
was also not one-to-one: a single user turn can contain multiple utterances, and ut- 
ternaces can span more than one turn. After the hand-segmentation process, each 
utterance unit was identified with a single dialogue act label. 

® Nevertheless, the sequential structure of the sentence is essential in order to segment 
the user turn into slots to have a real understanding of it. 
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5 Experiments 

The dataset is composed of 1 338 user turns after discarding the sentences labeled 
with the less-frequent frame classes. We have decided to split the corpus in two 
datasets, the first one containing only the uniclass turns (867 samples) and the 
complete one, which comprises uniclass and multiclass turns (1 338 samples). For 
each type of experiment, the dataset was randomly split (but we guarantee that 
each frame class is represented in the training and test set) so that about 80% 
of the user turns are used for training and the rest for testing. 

5.1 Training the Neural Networks 

With any neural network algorithm, several parameters must be chosen by the 
user. For the MLPs, we must select the network topology and their initialization, 
the training algorithm and their parameters and the stopping criteria [3, 12, 13]. 
We selected all the parameters to optimize performance on a validation set: the 
training set is subdivided into a subtraining set and a validation set (20% of the 
training data). While training on the subtraining set, we observed generaliza- 
tion performance on the validation set (measured as the mean square error) to 
determine the optimal setting of configuration and the best point at which to 
stop training. The thresholds S and T of the different multiclass classification 
rules were also learnt in the training process: we performed classification with 
the optimal configuration of MLP on the patterns of the validation set, proving 
several values of the thresholds and keeping the best one. 

5.2 UC and MC Experiments 

Table 1 shows the selected topology and the classification rate for each of the ex- 
periments. For the UC experiment, we used only the uniclass user turns (867 sam- 
ples). For the MC experiments, we consider a sample as correctly classified if the 
set of the original frame classes is detected. That is, if a user turn is labeled with 
two frame classes, only and exactly those classes should be detected. 

In the Accumulated Probability case, when applying classification rule (9) 
with a threshold S close to 0, that is, when the accumulated probability is close 
to 1, the error rate was very poor, misclassifying (nearly) all the multiclass 
samples. By analyzing the MLP outputs, we observed that when one or more 
classes are detected, each of the corresponding output values are close to one. 
Therefore, the MLP with a sigmoid activation function is unable to learn the 
true probability distribution across the whole set of classes. Due to this fact, we 
decided to apply the classification rule given in equation (14). 

6 Discussion and Conclusions 

This work is an attempt to show the differences between uniclass and multiclass 
classification problems applied to detecting dialogue acts in a dialogue system. 
We experimentally compare three connectionist approaches to this end: using 
accumulated posterior probability, binary multiple classifiers and one extended 
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Table 1. Classification error rates for the UC and MC experiments. 



Experiment 


Topology 


Total Uniclass Multiclass 


UC experiment 


120-64-64-10 


9.14 


9.14 


— 


MC experiments 

Binary classifiers with \C\ MLPs 


120-8-1 


17.91 


13.71 


25.80 


Binary classifiers with 1 MLP 


120-32-32-10 


11.19 


7.43 


18.28 


Accumulated Probability 


120-32-16-10 


14.55 


8.57 


25.81 



binary classifier. The results clearly shows that: firstly, multiclass classification 
is much harder than uniclass classification and, secondly, the best performance 
is obtained using one extended binary classifier. 

On the other hand, the results obtained for classifying dialogue acts also 
show that using a connectionist approach is effective for classifying the user turn 
according to the type of frames. This automatic process will be helpful to the 
understanding module of the dialogue system: firstly, the user turn, in terms of 
natural language, is classified into a frame class or several frame classes; secondly, 
a specific understanding model for each type of frame is used to segment and fill 
the cases of each frame. 
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Abstract. In this paper, a possibility of developing a new criterion for diagnos- 
tics of hematopoietic tumors, such as chronic B-cell lymphatic leukemia, trans- 
formation of chronic B-cell lymphatic leukemia into lymphosarcoma, and pri- 
mary B-cell lymphosarcoma, from images of cell nuclei of lymphatic nodes is 
considered. A method for image analysis of lymphatic node specimens is devel- 
oped on the basis of the scale space approach. A diagnostically important crite- 
rion is defined as a total amount of points of spatial intensity extrema in the 
families of blurred images generated by the given image of a cell nucleus. The 
procedure for calculating criterion values is presented. 



1 Introduction 

A large quantity of research in image processing and analysis are directed at the de- 
velopment of medical diagnostics. Recently appeared a new perspective trend con- 
cerned with the development of diagnostic techniques for automated analysis of mor- 
phology of blood cells and hematopoietic organs using analysis of microscopic im- 
ages. In this paper, a relatively small sample of images is used for obtaining the crite- 
rion for diagnostics of hematopoietic tumors, such as chronic B-cell lymphatic leuke- 
mia, its transformation to lymphosarcoma, and primary B-cell lymphosarcoma (ac- 
cording to the classification of A. Vorob’ev and M. Brilliant [5]). 

Experts in hematology have found out, that specimen cell nuclei of a tissue of lym- 
phatic nodes taken from patients with the malignant tumor diagnose are larger than 
those taken from patients with the non-malignant tumor diagnose. Thus, an obvious 
diagnostic criterion is the area of cell nucleus. But this criterion is unsuitable for more 
accurate diagnostics: it is impossible to distinguish such diseases as transformation of 
chronic lymphoid leukemia and lymphosarcoma. 

The procedure of searching for a diagnostic criterion includes the following steps: 
the experts indicate the diagnostically important cell nuclei in the images of lymphatic 
node specimens of three groups of patients having the diagnosed diseases. These im- 
ages are considered as an input information. Next, the developed method of specimen 
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image analysis is used for calculating qualitative characteristics (features) from the 
indicated nuclei. The obtained values are analyzed, and this gives an opportunity to 
formulate a criterion for making diagnostic decisions. The proposed method for 
specimen image analysis is based on the known scale space approach [1,3,4]. 



2 Properties of Cell Images and Requirements to the Method 

The image of a lymphatic node specimen is a color image taken by a camera and en- 
larged by a microscope (24 bits per pixel). The size of the image is 1536x1024 pixels 
covering a site of 60 — 100 microns in diameter. The resolution is 0,06 microns per 
pixel. The analyzed objects are the fragments of the gray-scale specimen images con- 
taining cell nuclei. These images are characterized by inhomogeneous coloring and by 
the presence of dark spots and bright areas representing their internal structure. 

For a diagnostics, experts pay a special attention to the cells of two classes: mature 
cells, with the mature structure of chromatin, (see Fig. 1) and sarcoma cells, with the 
immature structure of chromatin, (see Fig. 2) [5-7]. In the first case (chronic lymphatic 
leukemia), with few exceptions, the image contains only mature cells. In the case of 
sarcoma transformation of the chronic lymphatic leukemia, the specimen contains 
both mature and immature (sarcoma) cells. In the case of primary lymphatic sarcoma, 
the sarcoma cells prevail in the image. 

It is necessary to take into account such specific properties of cell images as low 
dye quality, instability of specimen characteristics, non-uniformity of specimen light 
exposure during microscoping, presence of damaged and unsuitable for analysis cells. 
Mature chromatin is homogeneous with light furrows. Immature chromatin can have a 
filamentous structure of different patterns, a fibrous, or a granular structure [7]. The 
analysis of cell nucleus images should yield quantitative characteristics that capture 
the structure and pattern of chromatin. 

In view of the specific properties of cell images listed above, the following re- 
quirements to the method are formulated: (a) suitability for selection of features for 
classification of cell images; (b) resistance to noise caused by image acquisition proc- 
ess and specimen quality; (c) resistance to errors and noise of image processing algo- 
rithms; (d) correspondence of classification results to expert estimations. 

The quantitative analysis cytological and tissue specimen images is based on the 
evaluation of shape, intensity, and textural features. In practice, the great attention is 
paid to automated analysis of chromatin arrangement in the cell nuclei. It has been 
proven in many studies that chromatin distribution corresponds to the state of malig- 
nancy. Two basic approaches to analysis of a chromatin constitution are known [9]. 
Within the first, structural, the chromatin distribution is considered as a local arrange- 
ment of rather small objects of varying intensity. The intensity features of dark and 
bright particles are evaluated. This approach is substantially heuristic. The second 
approach, textural, is based on the statistical characteristics of chromatin arrangement 
and related to analysis of the regularities of chromatin structure. Applied in practice 
methods for textural analysis use grey level dependency matrices [8], co-occurrence, 
run-length features, rice-field operators, and watersheds (topological methods) [10], 
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Fig. 1. Grayscale images of mature cell tumor 




(C) 



Fig. 2. Grayscale images of sarcoma cells of lymphatic nodes: filamentous structure of chroma- 
tin (a); granular (b); fibrous (c) 

heterogeneity, dumpiness, margination, and radius of particles [12] (the May- 
all/Young features), invariant features (polinomial invariants). 

The main disadvantage of known textural methods [9] is their sensitivity parame- 
ters and conditions of image acquisition, to properties of researched preparations, and 
also to precision of microscope focusing. 
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Below, we consider a method for analysis of cell nucleus images that was used for 
searching for a diagnostic criterion. The proposed method combines features of both 
approaches: on the one hand the intensity features of the chromatin particles are ana- 
lyzed, and on the other hand, the diagnostic criterion is formulated in terms of the 
simple quantitative characteristics, describing the chromatin structure of cell nuclei - 
the amount of intensity extrema in the families of blurred cell nuclei images. This 
feature is related to the amount of chromatin particles and characterizes the state of 
malignancy. 



3 Method for Analysis of Nuclei Images 

Among contemporary approaches to image analysis the approach of Gaussian scale 
space entirely meets the listed above requirements [1,3,4]. The scale space technique 
provides properties of invariance with respect to shift, rotation, scaling, and linear 
transformations of intensity. It decreases the sensitivity of the analysis to microscope 
focusing. The concept of the scale space gives the natural way to represent an input 
image Tfxj (T(x) is the intensity function of spatial coordinates) at finite resolution by 
convolving it with a Gaussian kernel G(x,t) of various widths, thus obtaining a 
smoothed image at a scale determined by the width a = (r - is a scale parameter) . 
L(x,t) satisfies the heat equation. The heat equation generates a family of blurred 
images [3]. As t increasing the blur effect grows, and fine details of the image are lost. 
The properties of constructed scale space reflect properties of the initial image. Scale 
space properties are explored by using localization of its critical points. 

The proposed method for cytological specimen analysis consists in construction of 
a family of blurred images (scale space) for various t and selection of diagnostic crite- 
rion using localization of scale space critical points. Critical points reflect the internal 
structure of the objects in the image, and exploration of the entire family of derived 
images allows one to analyze both fine details and large structural elements. 

3.1 Main Objectives of the Proposed Method 

Taking into account the concept of scale space approach, the following problems 
should be solved during cell image analysis and selection of diagnostic criterion: (a) 
construction of the one-parameter family of derived images L (x;f) (a scale space) 
from initial image L (x) for different diseases; (b) extraction of critical points in scale 
space images; (c) analysis of spatial critical points distribution for different groups of 
patients for diagnostic criterion selection; (d) calculation of diagnostic criterion values. 

3.2 Construction of a Family of Blurred Images 

For construction of a family of blurred images, it is necessary to determine the range 
and the step of scale parameter t. The computational experiments have shown that the 
analysis of critical points in image family is expedient for the values of scale parame- 
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ter in the range of 8<t<60. The step value of a scale parameter should be taken in the 
range of 0.005<Af<0.032. As a result, the families of blurred images corresponding 
to the scale spaces of malignant and non-malignant cell nuclei images were con- 
structed. 

3.3 Localization of Spatial Critical Points 

According to the scale space approach, the derived families of images were explored 
for detection of critical points. 



O 



0 



o 



(a) 



(b) 



Fig. 3. Spatial gradients (x;t) OR 0^2 (x;t)) at ? = 32 (a) and extracted closed curves 
around extrema in a single scale-space image (b) (negative images) 




For localization of critical points within proposed method, the topological proper- 
ties of iso-intensity manifolds in the neighborhoods of critical points [1] are used. The 
algorithm for selection of closed loops (curves of a nonzero gradient values, bounding 
iso-intensity curves around points of extremum) is applied. A special procedure that 
includes the standard image processing operations was developed. It consists in the 
following steps; 

1. The following operations are carried out for each image in the family: (a) logical 
summation of images Lx^ (x;t) and (x;t)\ (b) thresholding of the resulting im- 
age; (c) removing of the “rubbish”; (d) overlaying of a nucleus mask to restrict the 
area of interest and remove the residual noise at the peripheries of a nucleus. 

2. All scale space images processed at Step 1 are overlaid (logical OR). 

3. The morphological operations are applied in order to fill regions bounded by closed 
curves of nonzero gradient and to remove residual rubbish. 

4. The coordinates of the geometrical centers of the filled regions (the neighborhoods 
of extrema) are found. 

5. The total amount of the centers of the filled regions is calculated. 
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(a) (b) 

Fig. 4. Neighborhoods of scale space spatial extrema (a), centers of extrema neighborhoods (b) 

In Figs. 3, 4, the steps of extrema localization are illustrated. In Fig. 3 (a), the logical 
“OR” of spatial gradient images at t=32 is presented. In Fig. 3 (b), one can see the 
extracted closed curves around extrema at t=32. In Fig. 4 (a), the neighborhoods of 
extrema for the whole family of scale space images are filled with black and, in 
Fig. 4 (b), the centers of colored areas are presented. 

3.4 Selection of Diagnostic Criterion 

The developed procedure was applied to the analysis of scale spaces, generated by 
images of lymphatic node specimens for diagnoses of malignant (primary B-cell lym- 
phosarcoma and transformation of chronic B-cell lymphocytic leukemia) and non- 
malignant (chronic B-cell lymphocytic leukemia) tumours. 86 images of cell nuclei 
from 25 patients were analyzed. Families of blurred images for scale parameter in the 
range of 12.5<f<50 were generated and explored. In Table 1 the statistical character- 
istics of amount of spatial extrema for various diagnoses are given. 

Using the results of experiments, the chart displaying the characteristics of cell nu- 
clei images, such as total amount of spatial extrema n and the area of a nucleus s, was 
created (see Fig. 4). The chart area (see Fig. 4) includes three significant parts: (I) the 
area, located to the left of value s = 137 and below n = 60; (II) the area, where 
137 < s < 200 and n > 60; (III) the area to the right of value s = 200. 

The first area mainly contains the points corresponding to the diagnose of chronic 
lymphocytic leukemia (CLL). In the second area, the transformation of chronic lym- 
phocytic leukemia (TRCLL) is dominating. The third area contains transformation of 
chronic lymphocytic leukemia as well as lymphosarcoma (LS). For classification of 
cell nuclei located in area (III), it is possible to construct a separating functions. The 
spread of points in Fig. 4 in the region (III) is caused by the different types of structure 
and pattern of chromatin of the malignant cell nuclei (see Fig. 2). Therefore, the more 
accurate classification requires analysis of critical points for different types of chro- 
matin structure. 
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Table 1. Range of spatial extrema total amount for specimen images corresponding to various 
diagnoses 



Diagnose 


Min 


Max 


Chronic lymphocytic leukemia 


23 


60 


Transformation of chronic lymphocytic leukemia 


44 


216 


Lymphosarcoma 


59 


175 


Transformation of chronic lymphocytic leukemia and 
lymphosarcoma 


44 


216 




♦ CLL 
■ LS 
ATRCLL 



Fig. 5. Distribution of cell nuclei in coordinates “nucleus area”,”amount of extrema” (s,n)” 

The results presented in Fig. 4 and in Table 1 allow us to conclude that the total 
amount of spatial extrema in cell nuclei images may be used as a diagnostic criterion. 
A special technique for calculation of the diagnostic criterion value is developed and 
implemented in the “Black Square” software system [2]. 



4 Conclusions and Directions of Further Research 

We considered a possibility of developing a new criterion for diagnostics of hemato- 
poietic tumors from the images of cell nuclei of lymphatic nodes. The results are as 
follows. 

1 . The method for analysis of the images of lymphatic node specimens is developed. 

2. A diagnostically important criterion is obtained; it is defined as a total amount of of 
spatial extrema in scale space generated by the image of a cell nucleus. 
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3. The technique for calculating the diagnostic criterion value is developed and inte- 
grated in the “Black Square” [2] system library. 

The further research will he aimed at (a) increasing the precision of critical points 
localization, (b) selecting of diagnostic criteria based on the analysis of all types of 
critical points, and their evolution at the increasing scale parameter; (c) augmenting 
the sample of cell images. At the final stage of research, the decision rules for making 
diagnostic decisions will be formulated. 
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Abstract. This paper presents a tool that uses image segmentation and 
morphometric methods to evaluate testicular toxicity through the analysis of 
histological sections of mice testis. The tool is based on deformable models 
(Snakes) and includes several adaptations to solve important difficulties of 
histological sections imaging, mainly the low contrast edges between the 
boundary tissue of seminiferous tubules and the interstitial tissue. The method 
is designed to produce accurate segmentation and to keep track of tubular 
identities on images under study. The extracted data can be used 
straightforwardly to compute quantitative parameters characterizing tubular 
morphology. The method was validated on a realistic data set and the results 
were compared with those obtained with traditional techniques. The application 
of this new technique facilitates measurements allowing assessing a higher 
number of tubules in a fastest and accurate way. 



1 Introduction 

Histopathology is considered the most sensitive endpoint for evaluating testicular 
toxicology. One of histopathological signs of testicular toxicology is tubular 
contraction or dilation. Tubular contraction occurs as a result of reduction of fluid 
secretion and consequently reduction in the overall diameter of the seminiferous 
tubule. Chlordane is one of the substances that induce a reduction in the diameter of 
seminiferous tubules [1]. Tubular dilation, with the dilation of tubular lumen may also 
occur as a result of the increase of the fluid volume in the lumen, or as a consequence 
of obstruction of fluid flow. For example carbendazim causes obstruction of the 
ductular system that result in a severe and diffuse dilation of the seminiferous tubules 
[2]. The dilation of the tubular lumen may be a diffuse change and may not be 
obvious by microscopic observation [3], so quantitative analysis measuring tubule 
diameters or tubule area may be a sensitive way to detect tubule dilation. 

In this paper, we present a new technique, motivated by the desire to study 
histological testicular sections for evaluating testicular toxicology. Our method is 
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based on deformable models (snakes) and includes several adaptations to solve 
important difficulties posed by histological section imaging, mainly the low contrast 
edges between the boundary tissue of seminiferous tubules and the interstitial tissue. 
First a median filter and morphological operations are applied to enhance the image 
contrast and differentiation of objects (seminiferous tubules) from background. 
Second a local edge detector is applied to obtain an approximation to objects 
contours. Segmentation is completed using an adapted variant of a gradient vector 
flow (GVF) model [4], [5]. A smoothed instance of the final contour is then obtained 
through spline approximation based on the detected edge points. Morphometric 
parameters such as area, diameter and others are then easily computed. Our method 
was validated on a realistic data set and the results compared with those obtained with 
traditional techniques. 

In section 2, we describe preprocessing steps designed to enhance the detection of 
seminiferous tubules, where we introduce a new local edge detector based on the use 
of local average of differences between pixels and the median of neighborhood pixels. 
Section 3 recalls the original snake model [6], as well as a useful and popular 
extension, the GVF model [4], [5], and discusses their applicability to our data. 
Computation of features and measurements such as diameters and area are described 
in section 4. In section 5, we show results of our method and briefly present an 
algorithm prototype to study tubular testicular sections. Section 6 concludes with a 
short summary of our work. 



2 Preprocessing 

In this study male mice were used. After sacrifice, the left testes were removed for 
histological studies. Small pieces of testis tissue were fixed in Bouin’s fluid, 
dehydrated and embedded in paraffin wax. Transverse tissue sections with 5 microns 
thick were made in a microtome (Leitz model 1512) and stained with haematoxylin 
and eosin (H&E). Images were acquired with the Leica IM 100 Image Manager 
software. The hardware used to capture the images was composed by a Leica DC 200 
camera attached to a Leitz Laborlux K microscope. However, our prototype accepts 
as input any true color image in jpg or bmp format. The difficulty in detecting tubular 
testicular sections stems for the fact that they can not be distinguished only based on 
their gray level or gradient values, or using a simple comhination of smoothing and 
edge detector filters. To address these problems we attempted with a combination of 
noise suppression filter like average and median filter, with different mask sizes and 
histogram thresholding, followed by binarization and edge tracking [7]. Experiments 
were also carried out with edge maps detectors [8]. However, these methods fail 
frequently where tubular sections are out the focal plane or due to background 
contamination. We propose a new approach that is a combination of median filtering, 
morphological processing and a new developed edge detector to produce an edge map 
to be processed by the GVL Snake [4], [5]. 
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2.1 Initial Image Preparation 

The presence of noise in images represents an irrecoverable loss of information. The 
median type filters are widely used for noise suppression in early stages of a vision 
system [9], due to the following properties: 

• They preserve the ramp edges and boundaries of the objects, 

• They suppress impulses of short duration without significantly modifying others 
components, and, 

• They may be implemented easily and fast. 

The steps followed in the initial image preparation were: convert the input image 1 
(a true color image) first to a 256 gray level image and then build the median 
image M , applying the median filter with a mask of size 5x5. 

2.2 Morphological Operations 

Mathematical morphology is a novel geometry-based technique for image processing 
and analysis, originally developed to process binary images, based on the use of 
simple concepts from set theory and geometry such as set inclusion, intersection, 
union, complementation, and translation [10]. This resulted in a collection of tools, 
called morphological operators, which are eminently suited for the analysis of shape 
and structure in binary images [11]. The most well-known of these operators are 
erosion and dilation. Soon mathematical morphology was extended to grey-scale 
images. To extend binary morphology to grey-scale images different approaches have 
been proposed. Our work relied on the threshold set approach, in which a grey-scale 
image is decomposed in terms of its threshold (or level) sets. To each of these sets one 
can apply a binary operator, after which the resulting sets can be used to synthesize a 
transformed gray scale image [11]. 

A morphological operator when applied to a binary image may be regarded as a 
binary convolution where the convolution kernel is usually defined within a small 
mask. This kernel is better known as the operator structuring element, and its size and 
shape determines the outcome of the operation. Morphological operations apply 
structuring elements to an input image, creating an output image of the same size. 
Therefore the choice of size and shape of the structuring element be a major design 
issue in our image preprocessing strategy. 

Image Enhancement. Images under study contain many seminiferous tubules of 
different sizes that may be touching each other. Due to this, to increase the potential 
for future object discrimination we use a suitable combination of the top-hat and 
bottom-hat operations. The top-hat transform is defined as the difference between the 
original image and its opening. The opening of an image is the collection of 
foreground parts of an image that fits a particular structuring element. The bottom-hat 
transform is defined as the difference between the closing of the original image and 
the original image. The closing of an image is the collection of background parts of an 
image that fit a particular structuring element. We evaluated structuring elements of 
different shapes and sizes, obtaining the best results with the octagonal structuring 
element. A flat octagonal structuring element was created computing the radius of the 
minimum horizontal diameter in the smallest seminiferous tubule of the image under 
study. Figure lb show results obtained after to enhance image. 
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The mathematical formulation used to enhance M was: 

E = A - B where E is the enhanced image 
A = M + T 

B = ^M) - M bottom- hat of M 

^M) = M • K = [(M © K)^K] close M with the structuring element K 

T = M- J-IM) top -hat of M 

}'(M) = M o K = [(M©K)© K] open M with the structuring element K 
M Median image 
K Structuring element 
© erode operator 
© dilate operator 

Objects Differentiation. The image complement in combination with the flood-fill 
were the operations used for object (seminiferous tubules) differentiation (Fig. Ic). 
Image complement consists in subtracting each image pixel value from the maximum 
gray level pixel value supported by the image (in our case 255). Image flood-fill fills 
holes in the input image. A hole is a set of background pixels that cannot be reached 
by filling in the background from the edge of the image, in our case, a hole is an area 
of dark pixels surrounded by lighter pixels. 

The mathematical formulation to achieve objects differentiation (image X) was 
the following: 

X = [j^(K ‘^ )|" where jf is the flood - fill operation 

K image complement of K ; K = E - E*' where E is the enhanced image . 

2.3 Edge Detection 

Edge detection is a critical step, since edge information is major driving factor in 
subsequent “snake” performance. As we mentioned before in the initial part of this 
section different techniques were tested such as a combination of noise suppression 
by average and median filtering, with different masks and histogram thresholding, 
followed by binarization and edge tracking [7]. We also tried with edge maps 
detectors [8], but these methods fail frequently where tubular sections are out the 
focal plane or due to background contamination. However we found that applying a 
local median average to images has achieved a more suitable answer to distinguish 
boundary pixels between objects and background pixels. With a 3x3 mask size this 
new filter is a powerful edge detector, which produces a fine edge map contour (Fig. 
Id). 

The mathematical formulation of our edge detector is the following: 

X input image 

m N X N windows center on pixel (ig , Jq ) 



2 



286 



M.A. Guevaia et al. 



ffl median of CO 
^map ouput image 



map 






(2M+1) 



^|X(i'o +^,7o +/)-St| 




(a) (b) (c) (d) 

Fig. 1. (a) Original image, (b) Enhanced image, (c) Tubules differentiation, (d) Local median 
average. 



3 Deformable Models 

Mathematical foundations of deformable models represent the confluence of 
geometry, physics, and approximation theory. Geometry is used to represent object 
shape, physics inflict constraints on how the shape may vary over space and time, and 
optimal approximation theory make available the formal underpinnings of 
mechanisms for fitting the models to measured data. Deformable curve, surface, and 
solid models gained popularity after they were proposed by Terzopoulos for use in 
computer vision [12] and computer graphics [13] in the mid 1980’s. Terzopoulos 
introduced the theory of continuous (multidimensional) deformable models in a 
Lagrangian dynamics setting, based on deformation energies in the form of 
(controlled continuity) generalized splines. The deformable model that has attracted 
the most attention to date is popularly known as “snakes” [6]. Snakes are planar 
deformable contours that are useful in several image analysis tasks. They are often 
used to approximate the locations and shapes of object boundaries in images based on 
the reasonable assumption that boundaries are piecewise continuous or smooth. 

One of the more successful implementation of parametric active contours is the 
gradient vector flow (GVF) field approach proposed by Xu and Prince [4], [5]. The 
active contour that uses the GVF field as its external force is called a GVF snake. The 
GVF field points toward the object boundaries when is very near to the boundary, but 
varies smoothly over homogeneous image regions, extending to image border. The 
main advantages of the GVF field over the original snake model are that it can capture 
a snake from a long range - from either side of the object boundary - and can force it 
into concave regions. However, although the GVF model was proven to be superior 
capturing a snake from a long range from either side of the object boundary, it did not 
solve the difficulties posed by background contamination and the lost of contrast 
between seminiferous tubules edges. Therefore, our work was focused to develop a 
specific approach to enhance contrasts and to improve the edge map of images under 
study. These processes were described in details in the preprocessing section. We use 
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the GVF snake model [4], [5] to complete the segmentation process, which can be 
summarized as follows: for each edge map ) we computed the GVF field (Fig. 

2a). Then in a semiautomatic way are defined polygons. This procedure is carrying 
out first selecting manually a few edge points from seminiferous tubules. After initial 
snakes (Fig. 2a) are produced through spline approximation based on the selected 
polygons. Finally the GVF snake deformation is performed for producing an array of 
edge point q (final snakes) with a more accurate segmentation results (Fig. 2b). We 
used the following parameters values in the snake deformation process: elasticity 
(0.05), rigidity (0.0), viscosity (1), external force weight (0.6) and pressure force 
weight (0). 



4 Features 

Morphometric features express the overall size and shape of objects. For these 
features only the object mask O and its border g are needed, not the actual gray scale 
image [14]. In our case the shape and size were decisive to discriminate seminiferous 
tubules, allowing to analysis their dilation and contraction on images under study. 
Traditional methods only take in consideration the smallest horizontal diameter to 
differentiate tubules. Our proposal introduces the computation of minimum and 
maximum diameters and area. We use as input the final snake deformation q (see 
section 3), which (as was described before) is the array of edge points of seminiferous 
tubules detected. 

Mathematical formulation and computational sequence of measurements is the 
following: 

O object pixels (seminiferous tubule) 

$ c O set of edge pixels, contour of O (final snake deformation points) 
q, edge point 

A = |0| area = number of elements of O 
ci = — ^ i i - coordinate of centroid 

^ {i,j)=p&0 

cj = — ^ j j - coordinate of centroid 

^ (i,i)=peO 

C centroid point with coordinates {ci, cj) 
d(pl, p2) = ecludian distance between pi and p2 
rad = m\n{d (C,Pg)) minimum radius 

Pg diam (0) = Pg with rad r^Pg ^0 initial minimum diameter point 
RAD = max(<7 (C, P ^ )) maximum radius 

Pg DIAM (0) = Pg with RAD ^0 initial maximum diameter point 
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Pdiam=Pi_diami^)+'^*rad 

Pg diam (1) = Pg ^ min( J {Pdia^ , Pg )) final minimum diameter point 

diam = d(Pg diam(^)’Pg diam(O)) minimum diameter 
PdIAM — Pg_RAD (0) + 2 * RAD 

Pg DIAM (1) = Pg ^ min(<i (PoiAM > Pg )) maximum diameter point 

DIAM = d{Pg DiAM(^)’Pg DiAM^^)) maximum diameter 

Finally calculations include an average of area, minimum and maximum diameters of 
the seminiferous tubules set processed on the image under study (see figure 2c). 
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(a) (b) (c) 

Fig. 2. (a) GVF and initial snakes (b) Final snake deformation, (c) Micron’s measures of each 
object (seminiferous tubules): area, maximum diameter (DM) and minimum diameter (dm), and 
the averages of area, maximum diameter and minimum diameter. 



5 Results 

The practical implementation of our technique for evaluating testicular toxicity in 
histological sections images can be summarized as follows: for each image, we first 
construct the edge map (E^^^ ) , based on procedures described in section 2. Then the 

GVF field is computed. After in a semiautomatic way polygons are defined base on 
manual selection of few edge points from seminiferous tubules. The initial snakes 
then are produced (Fig. 2a) through spline approximation based on the selected 
polygons. Finally, snake deformation is performed to produce an array of edge 
point g (final snakes) with more precise segmentation results (Fig. 2b). 

The traditional method for measuring seminiferous tubule diameter is based on 
measurement of the minimum diameter. Flowever, it is difficult, if not impossible, for 
the human eye to distinguish which is the minimum diameter, so an approximation 
must be made. This approximation may lead to a lost of accuracy, mainly in the 
toxicology studies, because if we can not be sure that we are measuring the same 
feature it’s difficult to compare between different treatments. Our new method allows 
numerating the tubules measured and provides a range of parameters related to the 
tubule (e.g. area, minimum and maximum diameter). By the end, we know exactly 




Segmentation and Morphometry of Histological Sections 289 



which tubules were measured and the characteristics of each one. By other hand, 
applying the traditional method, we cannot mark the tubules that are being measured, 
so the same tubule can be measured twice, and we can not be absolutely sure that the 
measurement is being made by the minimum diameter. 



6 Conclusions 

We presented a semiautomatic method to segment and to measure histological 
testicular sections, which in a first step is the result of a correct combination of 
statistical and morphological operations. Hereafter based on the detected edges the 
segmentation process is completed using a parametric deformable model. Our 
technique allows a suitable segmentation of seminiferous tubules in histological 
section. The ability of our method was demonstrated on an experimental 
representative data set. This approach will be successful to study quantitatively 
histological sections of different specimens and can be extended easily to other fields 
of study such as cell counting and somatic embryos classification among others. 
Compared with traditional methods, our method computes two new measures: the 
maximum diameter and the area, and it facilitate measurements, assessing a higher 
number of tubules in a fastest and reproducible way and minimizing the typical errors. 
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Abstract. In this paper, we present a new algorithm to obtain robust markers 
for blood vessels segmentation in malignant tumors. We propose a two-stage 
segmentation strategy which involves: 1) extracting an approximate region 
containing blood vessels and part of the background, and 2) segmenting blood 
vessels from the background within this region. The approach was effectively 
very useful in blood vessels segmentation and its validity was tested by using 
the watershed method. The proposed segmentation technique is tested on 
manual segmentation. It is demonstrated by extensive experimentation, by using 
real images, that the proposed strategy was suitable for our application. 



1 Introduction 

Information contained within sampled medical image data sets is essential to several 
clinical tasks. Advances in technology allowed clinician not only the visualization, but 
also the identification of different objects. A major hurdle in the effective use of this 
technology is the image segmentation, where pixels are grouped into regions based in 
image features [1]. 

In the study of the angiogenesis process, the pathologists analyse all information 
related with blood vessels by using a microscope [2-A], This work is very tedious and 
time consuming and obviously, the automation of this analysis is highly desirable. In 
such a sense, an useful task for digital images processing should be the segmentation 
of blood vessels. 

Segmentation and contour extraction are key points of image analysis. Many 
segmentation methods have been proposed for medical image data [5-7]. 
Unfortunately, segmentation using traditional low level image processing techniques, 
such as thresholding, gradient, and other classical operations, requires a considerable 
amount of interactive guidance in order to get satisfactory results. Automating these 
model free approaches is difficult because of shape complexity, shadows, and 
variability within and across individual objects. Furthermore, noise and other image 
artifacts can cause incorrect regions or boundary discontinuities in objects recovered 
from these methods. 

The watersheds is a powerful segmentation tool developed in mathematical 
morphology [8, 9]. However, the correct way to use watersheds for grayscale image 
segmentation consists in first detecting markers of the objects to be extracted. The 
design of robust marker detection techniques involves the use of knowledge specific 
of images under study. Not only object markers, but also background markers need to 
be extracted. 
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The goal of this paper is to present a new algorithm to obtain robust markers for 
blood vessels segmentation in malignant tumors. The validity of our strategy was 
tested by using the watersheds method, where, according to criterion of physicians, 
blood vessels were contoured well. 

This paper is organized as follows: In Section II is outlined the theoretical aspects. 
In section III, we give the features of the studied images. In section IV, we present the 
steps to obtain markers, we also discuss an algorithm. Finally, we describe our 
conclusions in Section V. 



2 Theoretical Aspects 

This section presents the most important theoretical aspects. 

2.1 Pre-processing 

With the goal of diminishing the noise in the original images we used the Gauss filter, 
where we carried out the process of Gaussian smoothing with 0 = 3 and a 3x3 
window size. We carried out also the morphological opening. In this work, we used a 
structuring element type rhombus of 3x3 size. 

2.2 Watershed Segmentation 

Let us consider a two-dimensional grayscale picture F whose definition domain is 
denoted Dp a T}. F is supposed to take discrete (gray) values in a given range [0, L], 
L being an arbitrary positive integer. In the following, we consider grayscale images 
as numerical functions or as topographic relief. 

Definition 1 (Regional Minimum). 

A regional minimum M at altitude h of grayscale image F is a connected component C 
of T[, (F) such that C n Tij.i (F) = (3 , Tj, (F) being a threshold of F at level h. 
Definition 2 (Watershed by Immersion). 

We can figure that we have pierced holes in each regional minimum of F, this picture 
being regarded as a topographic surface. We then slowly immerse this surface into a 
lake. Starting from the minimum of lowest altitude, the water will progressively fill up 
the different catchment basins of F. Now, at each pixel where the water coming from 
two different minimum would merge, we build a dam (see Fig. 1). At the end of this 
immersion procedure, each minimum is completely surrounded by dams, which 
delimit its associated catchment basin. The whole set of dams which has been built 
thus provides a tessellation of F in its different catchment basins. These dams 
correspond to the watershed of F, that is, these represent the edges of objects. 



3 Features of the Studied Images 

The studied images were of arteries, which had atherosclerotic lesions and these were 
obtained from different parts of the human body, from more of 80 autopsies. In Figure 
2 can be seen typical image, which were captured via MADIP system with a resolution 
of 512x512x8 bit/pixels [10]. 
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Fig. 1. Building dams at the places where the water coming from two different minimum would 
merge. 




Fig. 2. These images represent the angiogenesis process. Blood vessels are marked with arrows. 

There are several notable characteristics of this image, which are common to typical 
images that we encounter in the tissues of biopsies: 

1. High local variation of intensity is observed both, within blood vessels (BV) and 
the background. However, the local variation of intensities is higher within BV 
than in background regions. 

2. It is common of these images the diversity in shape and size of BV. 

4 Experimental Results: Discussion 

4.1 Obtaining the Region of Interest 

The next stage of the strategy is to segment the approximate region, that is, a region 
which contains the blood vessel and its neighboring background. The exact shape and 
size of this region are not important, and hence the region is referred to as an 
approximate region. A measure of local variation of intensity is provided by the 
variance of the gray level intensity. The result of applying this procedure is shown in 
Figure 3. 

Figure 3 shows that the high variance corresponds to blood vessels while that the 
low variance belongs to background. We verified in what we did that for large 
window sizes the results were poor. On the one hand, it was obtained a higher 
homogeneity of the region, and on the other hand blood vessels were notably fattened. 

We obtained the region of interest by applying a global threshold to the local 
variation of intensity. Figure 4 shows the obtained result of this segmentation process. 

Then, we introduced the following algorithm to obtain robust markers for blood 
vessels segmentation in malignant tumors. 
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Fig. 3. Obtaining a region of interest. The variance map was obtained with 5x5 window size. 




Fig. 4. (a) Original image, fb) Regions of interest. 



4.2 Algorithm to Obtain Robust Markers 

The steps of the algorithm are described below. 

1 . Obtain the regions of interest. Let IREZI be the resulting image. 

2. Label the resulting image of the step 1. Then, to create an auxiliary image, let lAl 
be. All pixels of this image are put in zero. Scan IREZI at iterative way, then in 
this image all the background is labeled with value equal to 1 . 

3. With the goal of finding connected components (BV), scan IREZI again from the 
top to the bottom and from the left to the right. If there is a pixel, which belongs 
to a connected component and in lAl this pixel has value zero, then other iterative 
method begins to work. This new iterative method marks with a determined value 
within lAl all pixels belonging to a connected component. In addition, pixels 
within IREZI are marked also with a value, which identifies the connected 
component to which they belong. This action is carried out in the whole image. 
As this step is finished, in the IREZI image all the connected components were 
filled and in lAl all the connected components were labeled. 

4. Create other auxiliary image (let IA2 be) with the same values of the lAl image. 
Create also an array, which controls if a connected component was reduced. In 
the IA2 image is where in each step the reduction of the connected components 
are obtained, the final result in the lAl image is represented. 

5. Scan the labeled image (lAl). When a pixel is found, which belongs to a 
connected component, via other iterative method the same is reduced and in the 
IA2 image all the frontiers of the connected component are marked. If some pixel 
within the connected component is yet, which is no frontier, then in IA2 and lAl 
the frontiers are eliminated and this function begins again until that all points are 
frontiers. In this case, the obtained result (reduction) is taken as the mark. In the 
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array (see step 4) is indicated that the labeled component with this value was 
processed and it is begun to look for other component. 

6. Finish when the lAl image is completely scanned. When this step is concluded, 
in the lAl image all marks of BV are. These marks are collocated in the IREZl 
image, where the connected components of the IREZl image after the step two 
were filled. The IREZl image is the resulting image. 

Figure 5 shows the obtained results of applying this algorithm. In Figure 5(b) can be 
seen that the mark is unique to each of blood vessels, which is always within them. 
Figure 6 depicts all steps of the proposed strategy to obtain robust markers in blood 
vessel images. 




(a) (b) (c) 

Fig. 5. (a) Image with regions of interest. The arrows indicate the connected components and 
the holes within them, (b) Image with the connected components filled, the interior hole is the 
obtained mark (c) Superimposed original image on the obtained marks, which some are 
indicated with arrows. 



4.3 Application of the Obtained Strategy in the Watershed Segmentation 

There are many applications in digital image processing where is very important to 
eliminate the minima not wished and consider only the necessary minima (see section 
2). In the following, we propose a way to detect these necessary minima: 



r 
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Filtered 
image by 
Gauss 



=) 
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Fig. 6. Steps to obtain markers in angiogenesis images 



(a) Carry out a reconstruction by geodesic dilations with a rhombus of size 5x5 (in 
pixels) as structuring element and with constant h = 30 [8]. 

(b) Carry out a morphological gradient to the resulting image of the reconstruction. 

(c) Define via a horizontal profile from the gradient image the pixels belonging to 
background and those belonging to blood vessels. Of this way are obtained two 
thresholds. Then, carrying out a thresholding, in which we assigned a value to blood 
vessels and other to background. 
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The obtained result of applying the steps (a), (b) and (c) is pictured in Figure 7. In 
many practical cases the watershed segmentation produces an over-segmentation if one 
does not obtain robust markers. In fact, when transformation is directly applied to the 
gradient image is produced an over-segmentation due to existing noise in the original 
image. Figure 8 shows this effect. 




Fig. 7. (a) Original image, (b) Resulting image. 



Figure 8 (b) shows the poor result when are not obtained good markers. It is evident as 
the edges are not correctly defined. Flowever, in Figure 9, we show the obtained result 
of applying our strategy. 




Fig. 8. (a) Original image, (b) Watershed segmentation. It is evident the over-segmentation. 




Fig. 9. (a) Original image, (b) Image of marks, (c) Watershed segmentation. It is evident the 
quality of the obtained result. 



As can be seen in Figure 9 (c) the edges of blood vessels were correctly defined. In 
order to verify the good performance of our strategy, we superimposed the contours on 
the original image. This result can be observed in Figure 10. It is evident that were 
obtained the exact edges of blood vessels. In addition, the obtained edges were 
continuos, which is very important for this application. With the classical methods is 
not possible to obtain continuos edges. 
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Fig. 10. Contours superimposed on the original image. 



5 Conclusions 

In this work, we presented a new strategy to obtain robust markers for blood vessel 
segmentation via watershed method. A such a sense, we introduced a new algorithm, 
which identifies correctly blood vessels and eliminate considerably all spurious 
information. In order to obtain such goal, we carried out a study of the region of 
interest. We demonstrated by extensive experimentation, by using real images data, 
that the proposed strategy was fast and robust for the images which were considered. 
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Abstract. The new results of the research in the field of automation of hemato- 
poietic tumor diagnostics by analysis of the images of cytological specimens 
are presented. Factor analysis of numerical diagnostically important features 
used for the description of lymphoma cell nucleus was carried out in order to 
evaluate the significance of the features and to reduce the considered feature 
space. The following results were obtained: a) the proposed features were clas- 
sified; b) the feature set composed of 47 elements was reduced to 8 informative 
factors; c) the extracted factors allowed to distinguish some groups of patients. 
This implies that received factors have substantial medical meaning. The re- 
sults presented in the paper confirm the advisability of involving factor analysis 
in the automated system for morphological analysis of the cytological speci- 
mens in order to create a complex model of phenomenon investigated. 



1 Introduction 

In this paper, we deseribe new results of the research into automation of hematopoi- 
etic tumors diagnostics on the base of analysis of the images of cytological speci- 
mens. This work has been conducted since 2000 by the researchers of the Scientific 
Council "Cybernetics" of the Russian Academy of Sciences together with the re- 
searchers of the Hematological Scientific Center of the Russian Academy of Medical 
Sciences [I]. The necessary condition of such automation is the development of in- 
formation technology for morphological analysis of the lymphoid cell nuclei of pa- 
tients with hematopoietic tumors, which could be implemented in corresponding 
software system for automated diagnostics. The paper is devoted to the investigation 
of the numerical features used for the description of lymphoma cell nucleus by means 
of factor analysis. The method of factor analysis, which allows reducing and structur- 
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ing the initial data, proved to be efficient for the problem of morphological analysis 
of lymphocyte nucleus. 

The paper is organized as follows. Section 2 contains a brief description of the de- 
veloped information technology for the morphological analysis of the cytological 
specimens. Section 3 contains information about the initial data for factor analysis 
and about methods used for factor analysis. The results of factor analysis and some 
conclusions are presented in section 4. Note that the developed technology is de- 
scribed entirely in [4]. 



2 The Main Stages of the Morphological Analysis of the Blood Cells 

The developed information technology for the morphological analysis of the cyto- 
logical specimens includes the following stages of data preparation and analysis: 

1 . Creation of a database containing images of specimens of lymphatic tissues with 
isolated lymphocyte nuclei for patients with different lymphoid tumors. 

2. Noraialization of the images in order to compensate for different illumination 
conditions and different colors of stain used for the specimens. 

3. Choice of features which capture morphological characteristics of lymphocytes 
nuclei useful for lymphoma diagnostics. 

4. Calculation of values, statistical and qualitative analysis of the chosen features for 
the set of available nuclei. 

5. Selection of features for generating feature descriptions of lymphoma cell nuclei. 

6. Cluster analysis of the nuclei by using different subsets of the generated set of 
features. 

7. Qualitative and quantative analysis of the obtained clusters. 

8. Formation of a new feature space for description of the patients: 

• the Targe’ clusters of the cell nuclei are selected; 

• the new features are relational numbers of patient’s nuclei that belong to the se- 
lected clusters. 

9. Diagnosing the patients by the use of efficient recognition algorithms (for exam- 
ple, recognition algorithms based on estimate calculation [5]) applied to the feature 
descriptions developed in p.9. 



2.1 General Characteristic of the Source Data 

A base of photomicrographic images of lymphatic tissue imprints was created to 
select and describe diagnostically important features of lymphocyte nuclei images. 
The base contains 1585 photos of specimens of 36 patients. We choose 25 cases of 
aggressive lymphoid tumors (de novo large and mixed cell lymphomas (L) and trans- 
formed chronic lymphocytic leukemia (TCLL)) and 10 cases of indolent chronic 
lymphocytic leukemia (CLL). In one case the reactive lymphoid hyperplasia was 
diagnosed. 
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Fig. 1. Monochrome photomicrographic image of the footprint of lymphatic gland. 
Objective xlOO. 

The photos of specimens were collected and stored as the RGB -images in 24-bit 
TIFF format. Figure 1 shows monochrome picture of the slide of lymphatic gland. On 
the original RGB-images 4327 nuclei of lymphocytic cells important for diagnostics 
were indicated by experts. These nuclei were further segmented and analyzed. 



2.2 Selection and Extraction of Features for Lymphocyte Nuclei Description 

The principal property of the proposed technology is the generation of lymphocyte 
nuclei description by the features chosen and calculated from the images of speci- 
mens by the methods of image processing and analysis, and also by the methods of 
mathematical morphology and Fourier analysis. 

During the morphological analysis of lymphocyte nuclei, hematologists use the 
following characteristics. 

1. Nuclei size and density of different lymphoid cells in the specimen (presence of 
cells with nucleus larger than in most cells). 

2. Nuclear form (round, oval, folded), presence of invaginations. 

3. Textural features of chromatin (dispersed or condensed pattern; if dispersed - 
visual diameter of chromatin fibrils). 

4. Presence or absence of nucleoli; if present - their number, size, form and location 
in nuclei (central or peripheral). 

We provided formal equivalents of some of the above characteristics. They can be 
translated into geometrical and textural features of cell nuclei. Thus, the following 47 
features were chosen to describe nuclei morphology: 

1) an area of nucleus in pixels; 

2) four statistical features calculated on nucleus brightness histogram (average, dis- 

persion, 3’^'* and 4* central moments); 
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3) 16 granulometric features of nucleus; 

4) 26 features calculated on the Fourier spectrum of nucleus. 



2.3 Cluster Analysis of the Nuclei 

To preprocess the images of the specimens and to calculate the nuclei’ features, a 
program was developed which uses the libraries of the “Black Square” software sys- 
tem [3]. After calculation of the features, their statistical and qualitative analysis was 
carried out. Conducted analysis allowed us to select several groups of features, which 
further yielded diagnostically-perspective taxonomies. 

The main results of the lymphocytic cells investigations are based on their cluster 
analysis using developed feature space and its subspaces with the help of FOREL 
algorithm [2]. 

A number of sets of clusters were received by using different sets of features and 
different values of parameter of FOREL algorithm. Obtained sets of clusters were 
evaluated using various criteria (e.g., the number and the size of clusters in each set, 
the character of nuclei distribution in large clusters, etc.). Several “interesting” sets of 
clusters were selected which have good formal characteristics of their clusters and are 
promising for interpretation by hematologists. 

The obtained results showed that: 

1) the set of diagnostically important nuclei of the patients with considered lymphoid 
tumors is substantially heterogeneous since different clusters in it were clearly 
identified; 

2) clusterization of the lymphocytic nuclei using developed feature set is important 
from the medical standpoint; 

3) formal nucleus characterization by the developed feature set corresponds well with 
its qualitative morphological description and serves as a basis for development the 
automated software systems of morphological analysis of the blood cells and diag- 
nosis of hemoblastoses; 

4) the suggested technology provides transition from the diagnostic analysis of lym- 
phocytes nuclei to diagnosing patients with hematopoietic tumors by the use of 
pattern recognition techniques. 



3 Factor Analysis of the Features of Lymphocyte Nuclei 

As it is known, factor analysis allows one to estimate the dimensionality of a set of 
observed variables and to determine the structure of interconnections between these 
variables. Factor analysis can be used to replace a large set of observed variables with 
a smaller set of new hypothetical variables called factors. These factors are treated as 
principal variables that truly describe investigated phenomenon. 

The initial data were represented as a table with the rows corresponding to the nu- 
clei' feature descriptions. Factor analysis was conducted on the whole of given nu- 
cleus, on different combinations of nucleus taken from patients with different forms 
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of hemoblastoses (L, CLL and TCLL), and on 5 large clusters, extracted in the set of 
nucleus by means of taxonomy method. 

The tables of nucleus feature descriptions were normalized by columns by the rule 



where z .. , are the normalized and primary values of the 7 th feature of the ;th nu- 
cleus, s . is a standard deviation of the yth feature, y - is an average value of the yth 

feature. As a result of such transformation the features' variances became unit. Then, 
the correlations of the features were calculated in each group of nuclei. Factor analy- 
sis was applied to the corresponding reduced correlation matrices with the communi- 
ties on the principal diagonal. In some cases iterative procedure was applied for 
evaluation of communities, or squared coefficient of multiple correlation of features 
with the other ones was considered. The extraction of the factors was realized by 
means of 3 methods, namely by means of principal-factor method, centroid method 
and maximum-likelihood method. The quantity of the factors was determined as a 
result of combining the Kaizer criterion and the scree-test. The varimax-rotate strat- 
egy was applied with the purpose of obtaining the contansive interpretation of factors. 

It is known that factor problem has an ambiguous solution depending on restric- 
tions imposed on (the choice of factorial method). But considering that a single sim- 
ple factor explanation lay in the very data, different methods should yield nearly the 
same mappings, and it signifies that it is possible to establish a certain correspon- 
dence between factors from different solutions. Thus, next stage of analysis was the 
search of correspondences between factor mappings received with the help of 3 dif- 
ferent methods. Then average factor loadings of each feature under 3 methods were 
calculated, and factors were ranked by a maximal percent of variance explained by 
them. 

The conducted factor analysis allowed classifying different numerical features ac- 
cording to the cross correlation in independent groups, defining dominant factors. As 
a matter of fact each new factor proved to be a linear combination of several initial 
features, signed with high loadings (exceeding 0,7) for this factor (the most informa- 
tive features). The additional analysis revealed that a choice of the value 0,7 as a 
threshold was advisable as long as the combination of features with high loadings did 
not vary significantly under decrease of this value. 



4 The Results of Factor Analysis 

The results of factor analysis are presented in Table 1. The factor mappings re- 
ceived on the different groups of eell nucleus are presented in the columns of Table 1, 
and the factors of the same significance level are presented in the rows. The increase 
of row number corresponds to a reduction of statistical significance of the faetor, 
whieh is determined in turn by decrease of the varianee of this factor. Thus, we come 
to the following eonclusions: 

1. The initial set of features (47 elements) breaks at the average into 8 significant 
groups - factors. There are some cases where 3 factors of the greatest information 
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density (which explain the largest part of the total variance of the features) com- 
bine into a single factor (general distribution of features by factors) in the group 
of patients with TCLL diagnosis and in the first cluster; and into 2 factors as well 
- in the groups of patients with CLL and L diagnosis. Due to mentioned above the 
quantity of important factors varies. The increase of the quantity of factors on the 
samples of nucleus of the smaller size may be explained by the following rule: if 
the dimensionality of the considered sample decreases until a certain moment the 
accuracy of its description with the help of the greater number of factors in- 
creases. 

2. The factor mappings explain at the average 75% of the total variance of the fea- 
tures. 

3. The analysis conducted allows establishing the following classification of the 
features. 

a. The main factors include nearly all texture features, the feature with the num- 
ber 1 (an area of nucleus in pixels) and 3 granulometric features with the num- 
bers 14, 15, 16 (the general number of light grains in nuclei, the number of 
grains with typical size and with minimal size respectively). The features in- 
cluded in the main factors are of the greatest importance for hematopoietic tu- 
mors diagnostics and these very features should be considered in the first 
place. 

b. The second dominant factor includes the statistical features with the numbers 
3, 4, 5 - variance, 3''' and 4* central moments calculated on nucleus brightness 
histogram respectively. 

c. The features with the numbers 6, 7 (average and variance calculated on nu- 
cleus size histogram respectively) fall into the same factor with a minor statis- 
tical significance as a rule. 

d. Granulometric features with the numbers 10, 18 (the number of grains with 
sizes corresponding to local maxima and local minima of the constructed func- 
tions) occur together as well. 

The coefficients of cross correlation of the features are high enough in each fac- 
tor. 

4. It appeared that there are features that are inessential (have minor factor loadings) 
in all kinds of analysis, so these features bring in a little of new information. All 
these variables belong to the class of granulometric features of the nuclei and have 
the numbers 8, 9 - 3^“* and 4* central moments calculated on nucleus brightness 
histogram, 11, 12, 13 - typical, minimal and maximal size of light grains in the 
nuclei respectively. The features with the numbers 9, 11, 12, and 13 have minor 
pair correlations with the other parameters stably (<0,7). 

5. The clusters and groups of patients are distinguished by the factors. It is important 
to note that some groups of nucleus contain features which don’t occur in the set 
of the other groups. 
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Table 1. The results of factor analysis of the features for different groups of the nuclei. First 
column contains factor numbers, in the rest of columns first number is the feature number 
while the second number is the value of the respective factor loading. 
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5 Conclusion 

The method of factor analysis, which allows reducing and structuring the initial data, 
proved to be efficient conformably to the problem of morphological analysis of lym- 
phocyte nucleus. It confirmed that the proposed feature space, reflecting morphologi- 
cal characteristics of lymphocyte nucleus used in diagnostics, has a sufficiently sim- 
ple factor structure. The main goals of factor analysis were achieved, since we suc- 
ceeded in reducing the feature set composed of 47 elements at least to 8 informative 
factors and in making a classification of the features proposed. The important result is 
that the extracted factors allow to distinguish some groups of patients. This implies 
that received factors have contansive medical meaning. The results presented above 
are the prerequisites for involving factor analysis in the automated system for mor- 
phological analysis of the cytological specimens in order to create a complex model 
of phenomenon investigated. 

In future we intend to carry out the factor analysis on the other samples of patients 
in supplemented feature space for the purpose of conformation of existence of ex- 
tracted factors, and to exploit new methods of factor analysis as well. 
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Abstract. Traditional authentication (identity verification) systems, 
employed to gain access to a private area in a building or to data stored 
in a computer, are based on something the user has (an authentication 
card, a magnetic key) or something the user knows (a password, an iden- 
tification code). But emerging technologies allow for more reliable and 
comfortable for the user, authentication methods, most of them based 
in biometric parameters. Much work could be found in literature about 
biometric based authentication, using parameters like iris, voice, finger- 
print, face characteristics, and others. In this work a novel authentication 
method is presented, and first results obtained are shown. The biomet- 
ric parameter employed for the authentication is the retinal vessel tree, 
acquired through a retinal angiography. It has already been asserted by 
expert clinicians that the configuration of the retinal vessels is unique for 
each individual and that it does not vary in his life, so it is a very well 
suited identification characteristic. Before the verification process can be 
executed, a registration step is needed to align both the reference image 
and the picture to be verified. A fast and reliable registration method 
is used to perform that step, so that the whole authentication process 
takes very little time. 



1 Introduction 

Reliable authentication of people has long been an interesting goal, becoming 
more important as the need of security grows, so that access to a reliable per- 
sonal identification infrastructure is an essential tool in many situations (airport 
security controls, all kinds of password-based access controls, ...). Gonventional 
methods of identification based on possession of ID cards or exclusive knowledge 
are not altogether reliable. ID cards can be lost, forged or misplaced; passwords 
can be forgotten or compromised. A solution to that problems has been found in 
the biometric based authentication technologies. A biometric system is a pattern 
recognition system that establishes the authenticity of a specific physiological or 
behavioral characteristic possessed by a user. Identification can be in the form 
of verification, authenticating a claimed identity, or recognition, determining the 
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identity of a person from a database of known persons (determining who a person 
is without knowledge of his/her name). 

Many authentication technologies can be found in the literature, with some 
of them already implemented in commercial authentication packages [1]. Other 
methods are the fingerprint authentication [2] [3] (perhaps the oldest of all the 
biometric techniques), hand geometry [4], face recognition [5] or speech recogni- 
tion [6] . It also has been shown that for a more reliable system, combination of 
two or more of those techniques could be good choice [7]. 

But today the most of the efforts in authentication systems tend to develop 
more secure environments, where it is harder, or ideally, impossible, to create a 
copy of the properties used by the system do discriminate between authorized 
individuals and unauthorized ones, so that an impostor could be accepted by the 
biometric system as a true sample. 

In that sense, the system proposed here employs for authentication biometric 
parameter the blood vessel pattern in the retina of the eye: it is a unique pattern 
in each individual, it is almost impossible to forge that pattern in a false indi- 
vidual. Of course, the pattern is the same since the person is born until she dies, 
at least it appears a pathology in the eye. In http://www.eye-dentify.com 
a commercial authentication system is available, where characteristic points ex- 
tracted from the vessels are used to measure the similarity between images. Here 
a novel authentication method based in the whole retinal vessel pattern of the 
eye is presented, and first results obtained with that technique are shown. In 
the first section, a brief outline about image registration is presented, because of 
the necessity of a prior alignment of the images to be compared. Second section 
describes the system developed in our laboratory to test the accuracy of our 
method, and in the third section an experiment run in collaboration with the 
University Hospital of Santiago and results obtained are shown. Finally conclu- 
sions and future lines are included as a closing section. 

2 Methodology 

2.1 Image Registration 

In many cases it is almost impossible to acquire the biometric parameter in the 
same conditions than the stored template used for the authentication, so that 
a first step of normalization of both parameters (the acquired and the reference 
one) is needed in order to make the system reliable enough, avoiding the rejection 
of legitimate users by changes due to illumination, translations or rotations in 
the image. The main drawback of retinal angiographies is the different position 
of the vessels used in the authentication, because it is very difficult that the user 
place the eye in the same position in different acquisitions, so that an alignment 
is necessary prior to the authentication. To perform that alignment, an image 
registration algorithm is employed. 

Image registration consists in estimating the transformation T (we will only 
consider affine transformations) that aligns two images so that the points in one 
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image can be related to points in the other. To determine the optimal transforma- 
tion an iterative process is performed so that a similarity measure is optimized. 

There is a lot of image registration methods (see [8] [9] for complete surveys 
about them) . The registration method developed for the alignment of the images 
employed in the authentication process have been widely described in [10] [11], 
but for the sake of convenience a brief outline will be included in the following 
subsection. 

Creaseness based registration method. Vessels can be thought as creases 
(ridges or valleys) when images are seen as landscapes. Amongst the many defi- 
nitions of crease, the one based on level set extrinsic curvature (LSEC) has useful 
invariance properties. Given a function L : ^ TZ, the level set for a constant I 

consists of the set of points {xjL(x) = 1}. For 2D images, L can be considered as 
a topographic relief or landscape and the level sets are its level curves. Negative 
minima of the level curve curvature k, level by level, form valley curves, and 
positive maxima ridge curves. 

K = (2LxLyLxy — LyLxx ~ LxLyy){Lx + Ly) 5 ( 1 ) 

However, the usual discretization of LSEC is ill-defined in a number of cases, 
giving rise to unexpected discontinuities at the center of elongated objects. In- 
stead, we have employed the MLSEC — ST operator, as defined in [12]. This 
alternative definition is based on the divergence of the normalized vector field 
w: 

K = — div(w) (2) 

Although equations (1) and (2) are equivalent in the continuous domain, in 
the discrete domain, when the derivatives are approximated by finite centered 
differences of the Gaussian-smoothed image, equation (2) provides much better 
results. 

After the extraction of the vessel landmarks (see figure 1 (c) and (d)), the 
straightest approach is to perform an iterative optimization of some alignment 
function: one image is taken as reference, while the other is iteratively trans- 
formed until the function attains a hopefully global maximum. As the optimiza- 
tion function. Downhill Simplex Iterative algorithm was selected, as implemented 
in [13], and for alignment, the linear correlation function. 

But this straight approach works only for almost-aligned and identical con- 
tent images; the common case is that the optimization gets trapped in a local 
maximum. Therefore, some sort of exhaustive search for most promising seeds 
must be performed before the Simplex search starts. An efficient way to do it 
is in the Fourier domain, employing a well known property which relates a mul- 
tiplication in this domain to the values of linear correlation. Furthermore, in 
order to overcome the time bottleneck that this computation demands, we build 
a pyramid for each image, where each level is a sampled version of a local max- 
imum of the previous level. The exhaustive search is computed only at the top 
(smaller) image, which greatly reduces the computation time. The method is 
more widely described in [14] [10]. 
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(c) (d) 



Fig. 1. Two examples of retinal angiographies, where variation between individuals can 
be seen. Images in (c) and (d) depict the extracted vessels of (a) and (b) respectively. 



2.2 Retinal Based Authentication 

Once the registration process has been performed and images are aligned, ex- 
tracted registered creases images are utilized to obtain a similarity measure be- 
tween them. So, if two images belong to the same person, aligned creases images 
will be more similar than images from different persons, although the registra- 
tion process is successfully performed. The measure employed must be robust 
against changes in image amplitude such as those caused by changing lighting 
conditions, and also against the number of points obtained in the creases extrac- 
tion process. Such conditions are fulfilled by the Normalized Cross-Correlation 
coefficient {NCC ), that is defined as [15]: 

T,x,y[fi^^y) -7Mx,y) 

T' = 7 Z 70^ (3) 



where i is the mean of the registered image, and / is the mean of the image. 
It must be noted that although the sums are over all of the images, only the 
overlapping areas of them are not null (as depicted in Figure 2, where the original 
and the registered images are shown). 

Once calculated the normalized correlation coefficient 7 , a confidence mea- 
sure must be determined to know if two images belong to the same person. To 
avoid false acceptance cases caused by errors in the acquisition, where only small 
creases could be extracted, an acquired image is considered valid for the authen- 
tication algorithm if the number of points in the creases is above a minimum 
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(a) (b) (c) 

Fig. 2. (a) Original with no overlapping area creases image, (b) cropped original with 
only the overlapping area creases image and (c) registered creases image. Only over- 
lapping area of the images are not null 



number of points. That threshold is obtained by the application of the Tcheby- 
cheff theorem [16]: if the number of points in the creases N^. fulfills that N,, > 3cr, 
where a is the mean number of points in the creases of a set of well acquired 
images, then the image will be considered as valid image for the system, but if 
Nc < 3(7 then the image will be rejected by the system. 

3 Method Validation and Results 

Images employed in our experiments were acquired in a period of 15 months 
and in different centers of the University Hospital of Santiago de Compostela 
(CHUS), although all of them with the same camera, a Cannon CR6-45NM 
Non-Mydriatic Retinal Camera, with a resolution of 768x584 pixels. Although 
originally they were color images, a conversion to gray-level images was per- 
formed prior to the storage in the database, since color does not provide any 
useful information. 

First experimental results showed that the value of the NCC of the images 
belonging to the same individual, although acquired in different times, is always 
above the value 0.6. In that first experiments, a set of 4 images from 5 different 
persons (20 images) were evaluated by the system. 

To test the reliability of our system, a bigger blind experiment was designed in 
collaboration with the CHUS: a set of 119 retinal angiographies was introduced 
in the system. In the benchmark two kind of images could be found: the more 
of the images (110) belonged to different individuals, and a reduced number 
of them (6 from 3 individuals, 2 of each) were images from the same persons 
taken in different times. The system should be able to find the images in the 
benchmark which pertain to the same persons. 

In the test, the NCC of the cartesian product of the set of 116 images was 
calculated (three images were eliminated from the total of 119 because they 
presented very poor contrast, so creases were too small and were refused by 
the system as described above). The value of the NCC of the rest of images 
was normalized to the interval [0, 1], as can be seen in the figure 3. It is clear 
that the values of the diagonal of that image are all 1, since it belongs to the 
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correlation of the images with themselves. The other values belong to the other 
two categories: values bigger than 0.6 are obtained correlating images pertaining 
to the same person but acquired in different moments, and the rest of the values, 
which are all of them under the peak value 0.35 corresponds to the value of the 
NCC of images of different individuals. That way, to verify the identity of an 
individual, the system only has to search in the set of stored templates, and if 
the value of the NCC is below a confidence level for all the correlation values, 
the person will not gain permission to get into the protected area or to read the 
information. 

The confidence level represents a very important parameter in the system, 
since a too low level would lead the system to accept even false individuals, but 
a too high level would reject legitim individuals. Figure 4 shows the percentages 
of false rejection and false acceptance cases. It can be clearly seen in that figure 
that until the threshold value is 0.60, true positive cases percentage is 1, meaning 
that no true positive is rejected. From that point until the threshold is 1 the 
acceptance cases are just the values from the NCC of each image with itself, 
which is always 1.0. In the opposite side, when the threshold goes down, false 
negative cases does not appear until its value is 0.35, growing exponentially from 
that point. From this values, when the threshold value is in the range from 0.35 
to 0.60 the successful percentage of the system is 100%. 




Fig. 3. Two views of a graph representing the values of the correlation obtained in the 
experiment with 119 images. Main diagonal is always 1, since it corresponds to NCC of 
each image with itself, and the other peaks with value 0.6 correspond to the correlation 
of images from the same person taken in different moments. 



All the conclusions exposed in this work were tested by the expert clinicians 
of the CHUS, since they knew before the experiment was performed, which 
images belonged to the same individuals, and which were not, concluding that 
results were right, and that matching images were effectively taken from the 
same patients, and that did not exist false rejections, so the system got, for this 
first tests, a 100% of success. 
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Fig. 4. Percentages of false acceptance and false rejections when the threshold level is 
varied. 



4 Conclusions and Future Lines 

A novel authentication method has been presented here. The authentication pro- 
cedure employs the retinal vessel tree as the biometric parameter, with a prior 
registration stage needed to align the template image and the acquired image. 
To measure the similarity between the images, Normalized Cross Correlation of 
the aligned creases extracted from the images is used. The technique has been 
extensively tested, with a test that involved 14.161 cases, giving very good re- 
sults. It must be noted that the registration method employed here is coherent 
[17], since the result obtained from the registration of image Ii registered against 
I 2 is the same than the result obtained using I 2 as the reference image (figure 3). 
From that experiment, it can be assessed that NCC could be used as a robust 
measure of the similarity of the images, with values over 0.6 for the 127 cases of 
images from the same person, and values under 0.35 for the 14034 images which 
belong to different individuals. Moreover, an analysis of the behavior of the sys- 
tem when the acceptance threshold is varied is presented, so that it can be seen 
that a wide band of 0.35 in the NCC appears between the acceptance area and 
the rejection area. The mean time taken to perform each image authentication 
is 0.3 seconds, 0.26 seconds for the registration and 0.04 seconds to perform 
the computation of the NCC value, so that the method is very well-fitted to be 
employed in a real authentication system. 

Future research will include the development of a hardware system based on 
the technique presented here which will improve performance until almost real 
time authentication. 



Acknowledgements. This paper has been partly funded by the Xunta de 
Galicia through the grant contract PGIDT01PXI10502PR. 



Retinal Angiography Based Authentication 313 



References 

1. J.G. Daugman. Biometric personal identification system based on iris analysis. 
United States Patent No. 5, 291. 560, 1994. 

2. Federal Bureau of Investigation. The science of fingerprints: Classifications and 
uses. Technical report, U.S. Government Printing Office, Washington D.C., 1984. 

3. A. Jain, L. Hong, S. Pankanti, and R. Bolle. An identity authentication system 
using fingerprints. Proceedings of the IEEE, 85(9), September 1997. 

4. R. Zunkel. Hand geometry based verification. In BIOMETRICS:Personal Identi- 
fication in Networked Society. Kluwert Academic Publishers, 1999. 

5. W. Zhao, R. Chellappa, A. Rosenfeld, and P. Phillips. Face recognition: A literature 
survey. Technical report. National Institute of Standards and Technology, 2000. 

6. J.Biguin, C.Chollet, and G.Borgefors, editors. Proeeedings of the 1st. International 
Conference on Audio- and Video-Based Biometrie Person Authentication, Crans- 
Montana, Switzerland, March 1997. 

7. Patrick Verlinde, Gerard Chollet, and Marc Acheroy. Multi-modal identity verifi- 
cation using expert fusion. Information Fusion, 1(1):17“33, 2000. 

8. L.G. Brown. A survey of image registration techniques. ACM Computer Surveys, 
24(4):325-376, 1992. 

9. J.B.A. Maintz and M.A. Viergever. A survey of medical image registration. Medical 
Image Analysis, 2(l):l-36, 1998. 

10. D. Floret, C. Marino, J. Serrat, A.M. Lopez, and J.J. Villanueva. Landmark-based 
registration of full slo video sequences. In Proceedings of the IX Spanish Symposium 
on Pattern Recognition and Image Analysis, volume I, pages 189-194, 2001. 

11. C. Marino, M. Penas, M.G. Penedo, D. Floret, and M.J Carreira. Integration of 
mutual information and creaseness based methods for the automatic registration 
of slo sequences. In Proceedings of the SIARP’2001, VI Simposio Ibero- Americano 
de Reconhecimento de Padroes, volume I, 2001. 

12. A. Lopez, D. Floret, J. Serrat, and J.J. Villanueva. Multilocal creasness based 
on the level set extrinsic curvature. Computer Vision and Image Understanding, 
77:111-144, 2000. 

13. W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C. 
Cambridge University Press, 2 edition, 1992. 

14. D. Floret, A. Lopez, J. Serrat, and J.J. Villanueva. Creaseness-based CT and 
MR registration: comparison with the mutual information method. Journal of 
Electronic Imaging, 8(3):255-262, July 1999. 

15. J.P.Lewis. Fast template matching. Vision Interface, pages 120-123, 1995. 

16. S. Ehrenfeld and S.B. Littauer. Introduetion to Statistical Method, page 132. 
McGraw-Hill, 1964. 

17. G.E.Ghristensen and H.J. Johnson. Consistent image registration. IEEE Trans, on 
Medical Imaging, 20(7):568-582, July 2001. 



Suboptimal Classifier for Dysarthria Assessment 



Eduardo Castillo Guerra' and Dennis F. Lovely^ 

'Centre for Studies on Electronics and Information Technologies, Central University "Marta 
Abreu" of Las Villas, Can'. Camajuani Km SVi, Santa Clara, VC, Cuba, 50100 
ecastillo@fie.uclv. edu . cu 

^ Department of Electrical Engineering, University of New Brunswick, Fredericton, NB, 

E3B5A3, Canada. 
lovely@unb . ca 



Abstract. This work is focused on the design and evaluation of a suboptimal 
classifier for dysarthria assessment. The classification relied on self organizing 
maps to discriminate 8 types of dysarthria and a normal group. The classifica- 
tion technique provided an excellent accuracy for assessment and enabled clini- 
cians with a powerful relevance analysis of the input features. This technique 
also allows a bi-dimensional map that shows the spatial distribution of the data 
revealing important information about the different dysarthric groups. 



1 Introduction 

Dysarthria is a collective name given to a group of neurological diseases that are 
originated by lesions in the peripheral or central nervous system. The location of the 
lesions determines the perturbation induced on the speech signal. Therefore, there is a 
relationship between the speech perturbations observed and the type of dysarthria. 

The assessment of dysarthria is often performed based on features extracted from 
recorded speech which reflect the perturbation patterns described by the different 
types. The goal for dysarthria assessment is to obtain high sensitivity in the discrimi- 
nation process while allowing clinicians to perform backward analysis of the feature 
contribution to the final decision of the classifier. This analysis is necessary to estab- 
lish a correlation between the most prominent features in each dysarthric group and 
the neurological damage. 

The main limitations encountered in dysarthria assessment can be defines as: the un- 
availability of objective measures that describe efficiently the speech perturbations, 
the lack of a gold standard to compare different techniques developed and the few 
databases available for research. The first limitation mentioned is currently being 
studied exhaustively where new digital signal processing algorithms are being devel- 
oped to describe more accurately those perturbations used as clues for assessment [1], 
[2], [3], [4], [5], [6]. The second limitation is still an unsolved problem in the research 
community due to the existence of different severity levels of the speech perturbation, 
the disagreement among researchers about the best set of features to use in the assess- 
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ment and the interrelation between perturbations. However, steps have being given 
toward this limitation with the workshop on acoustic voice analysis [6]. The last limi- 
tation is a critical problem due to the difficulty of recording large population of sub- 
jects with these diseases. New pathological speech databases are now available for 
research but are not focused only on dysarthric patients, therefore, they provide lim- 
ited information [7]. As a consequence of these limitations, the development of effi- 
cient classification techniques for this application is highly desirable and necessary. 



2 Experiences in Dysarthria Assessment 

Several protocols have been used for dysarthria assessment considering a variety of 
descriptive features. The diagnosis of the dysarthria has been traditionally performed 
by the differential diagnosis of dysarthria [8]. This diagnosis method relies on per- 
ceptual judgments (PJ) of the pathological speech as the main descriptors. The authors 
defined 38 features or dimensions that describe more efficiently the speech perturba- 
tions. The judgments are grouped into clusters according to the speech mechanism 
affected and the combination of the clusters exhibited determines the type of dysar- 
thria. The decision is based on minimal distance between the clusters manifested and 
the combination of clusters that characterize each type of dysarthria. 

A survey performed regarding the use of this assessment method showed that more 
than 60% of clinicians in North America use this system in their clinical practice. 
However, there are limitations reported for this method in the effectiveness to assess 
subjects with mixed dysarthrias [9]. The PJ can also be imprecise an inconsistent when 
certain speech features are analyzed, particularly when they are performed by clini- 
cians which come from different schools and have different reference points. This way 
of judging often leads to low reliability and repeatability of the process of describing 
the speech perturbations, causing low assessment rates and difficulties standardizing 
the results of the research in this area. 

Other assessment methods, summarized in [10], describe similar protocols based on 
linear analyses of different sets of features. The definition of the descriptive features is 
also different but provides information of similar speech perturbations. Most of these 
protocols rely also on perceptual judgments of speech and suffer from similar limitations 
as the traditional method. Some assessment protocols reported perform the discrimina- 
tion between classes using linear discriminant analysis (LDA). This approach provides 
better performance than the traditional clustering method. However, the relevancy analy- 
sis of the input feature has been found imprecise [3]. 

More recent studies reported by Callan et al. 1999 [11], implemented the assessment 
of few dysarthric groups using a small set of objective measures and self-organizing 
maps (SOM). Despite the small number of subject and groups of dysarthrias assessed the 
study revealed the effectiveness of the method and opened new options in the assessment 
of dysarthria. 

Nowadays, the trend in dysarthria assessment is the use of different tools to provide 
objective measures of the speech perturbations and implement non-linear classification 
techniques to differentiate the different groups [5]. This approach could lead to better and 
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more consistent judgments. However, the backward analysis on the decision of the clas- 
sification is an important requirement that the classification method has to meet to pro- 
vide a global picture these diseases. 



3 Self- Organizing Maps 



A self-organizing map is an unsupervised neural network that learns to recognize 
regularities and correlations in its input data and adapts future responses according to 
that input. This network (see block diagram in Fig. 1.) not only learns to recognize 
groups of similar input vectors but also neighboring neurons learn to recognize neigh- 
boring sections of the input space. Therefore, the SOM learns both the distribution and 
topology of the input vectors on which they are trained. 
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Fig. 1. Diagram of a self-organizing map Network 



SOMs are made up of an input layer that interfaces with the multidimensional features 
and a representational layer. This second layer is a two-dimensional array of nodes which 
has a weight matrix associated. A distance box is used to estimate the negative distance 
between the input vectors (V) and the weight matrix. This estimates the neuron that more 
likely represents the characteristics of the input case (winning neuron). The competitive 
level produces a one for the output element corresponding to the winning neuron while 
all other elements are set to zero. However, neurons close to the winning neuron are 
updated along with the winning neuron using the Kohonen learning rule [12] expressed 
as: 



m;(t-l- 1) = m;(t)-l-Q:(t)[x(t) -m;(t)] (1) 

where m^ is the connection weights of node i for time step t, x(t) is the input vector for 
time step t, and a(t) is learning rate for time step t. 

The characteristics described previously allow this type of network not only to learn the 
distribution of the input vector but also to gain information regarding the neighborhood. 
The way that neighbors’ neurons are updated depends on the topology selected, which 
can vary between rectangular, hexagonal or random. The weight values are determined 
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by an unsupervised learning algorithm that offers the advantage that a gold standard 
target is not required. 

Iteratively, the network learns the distribution of the input vector while the data are pre- 
sented and the weighted connection of the representational layer is corrected. Each win- 
ning neuron is obtained through a process of determining which of the nodes in the repre- 
sentational layer is closer to the input vector, according to the distance criteria [12]. 



3.1 Design of the Classifier 

A SOM network designed for this type of application requires two layers, an input layer 
and the representational layer (Fig. 1). The input layer contains 20 inputs neurons corre- 
sponding to the 20 observations obtained from perceptual and acoustic analysis', de- 
scribed more precisely in [3]. The representational layer consisted of a 9-by-9-node layer 
with a hexagonal lattice configuration. A bubble neighborhood function type was used in 
the training phase as recommended by Kohonen (1995) [12]. This bubble function is 
reported to provide a configuration that can show better visual information in the map. 

Fifteen SOMs were trained using different initial random weights and the configura- 
tion that provided the lowest overall quantization error was selected for further studies. 
The SOMs were trained in two steps as recommended by Kohonen (1995), the ordering 
phase and the convergence phase. The ordering phase refers to the task of ordering the 
reference vectors. In this phase, the neighborhood radius is close to the diameter of the 
map and is decreased during the training. The learning rate is large and decreases toward 
zero as the network is trained. The initial radius used in the network was 9 with a learn- 
ing factor of 0.09 and 2000 iterations. This phase established a gross association between 
the nodes and the input vectors. 

The convergence phase is the step in which the reference vector on each node con- 
verges to an ‘optimal’ location. The radius and the learning rate are usually smaller in 
this phase while the number of iterations is usually larger than in the ordering phase. The 
radius used in this phase of the design decreased from 1 to 0 with a learning rate of 0.008 
decreasing to 0 as well. The number of iterations in this phase was 52000 to allow time 
for the convergence to the optimal position. This phase allows a fine tuning between the 
vectors and the nodes. The SOM_PAK software [13] was used for both phases of the 
training. 



3.2 Evaluation of the Performance of the Classifier 

The performance of the classifier was evaluated using a set of 127 subjects from 9 
target classes (AD; Ataxic dysarthria, ALS: Amyotrophic Lateral Sclerosis, FD: Flac- 
cid dysarthria, HC: Chorea, HD: Dystonia, OVT: Organic Voice Tremor, HP: Parkin- 



' Observations: Pitch level, pitch break, tremor, excess of loudness variation, harsh voice, 
breathy voice, voice stoppages, audible inspirations, speech rate, short phrases, short rushes 
of speech, monoloudness, hypemasality, reduced stress, variable rate, prolonged intervals, 
inappropriate silences, excess or equal stress, articulatory breakdowns and distorted vowels. 
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son’s disease, SD: Spastic dysarthria and NS: Control Group). A cross-validation tech- 
nique was used to prevent an overly optimistic classification rate. This technique 
works by omitting one subject’s data, then retraining the neural network using the 
remaining data and finally classifying the omitted observation. In this way, more data 
samples participate in the network training and a more realistic performance measure 
is obtained. The main disadvantage of this method is that a set of networks is obtained 
from the training process. However, the network that provided the lowest overall quan- 
tization error was kept as the most representative network. The distribution of the vec- 
tors across the SOM map for this network is shown in Fig. 2. 




Fig. 2. Distribution of the classified groups across the SOM. The dots represent the nodes of the 
9x9 map implemented and the lines represent the Euclidean distances between each node. The 
colored areas enclose the most probable neurons for each group 



It is observed that the SOM performed a good separation between the different 
groups. The shaded areas represent the zones in which a larger number of nodes of 
each class tends to concentrate. The normal group is the most concentrated in its area 
with HP, AD, HD, and OVT having very well defined areas. As expected, the HC 
group is observed close to HD since both are hyperkinetic dysarthrias sharing common 
typical dimensions. Similarly, the area of the groups FD and SD are overlapped due to 
both share many similar dimensions. It is noticeable that there are some ALS neurons 
in between the area defined for FD and SD. This is explained from the clinical view 
point based on the fact that ALS is a mixed dysarthria made of a combination of FD 
and SD. 
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The confusion matrix observed in Table 1 shows the performance of the classifica- 
tion technique. The lowest classification rates are observed for HD and HC groups due 
to the great overlap among its speech deviations. It is observed that although ALS 
class was the most scattered across the map, its percent of correct classification (PCC) 
was not the worst. The quantization error of each sample of the dataset with respect 
the selected map was 1.916 for the network that performed worst. 

Table 1. Confusion matrix of SOM classifier after cross-validating the dataset 



Group True Groups 





AD 


ALS 


FD 


HC 


HD 


OVT 


HP 


NS 


SD 


AD 


11 


0 


0 


1 


1 


0 


1 


0 


0 


ALS 


0 


10 


1 


2 


1 


0 


0 


0 


0 


FD 


2 


0 


12 


0 


1 


1 


0 


0 


0 


HC 


0 


0 


0 


8 


0 


1 


0 


0 


0 


HD 


0 


0 


0 


2 


10 


0 


0 


0 


0 


HO 


0 


1 


0 


0 


0 


11 


0 


0 


0 


HP 


0 


0 


0 


0 


0 


0 


15 


0 


0 


NS 


0 


0 


0 


0 


0 


0 


0 


19 


0 


SD 


0 


2 


0 


0 


1 


0 


0 


0 


13 


Total 


13 


13 


13 


13 


14 


13 


16 


19 


13 


Correct 


11 


10 


12 


8 


10 


11 


15 


19 


13 


PCC 


0.846 


0.769 


0.923 


0.615 


0.714 


0.846 


0.938 


1.00 


1.00 


N total=127 


N Correct= 


o^ 

O 




Proportion Correct=0.8583 





The total PCC obtained with the SOM method is 0.86, which outperformed the re- 
sults obtained with the traditional method (0.66) and the LDA method (0.81). The 
difference in performance between the SOM and the LDA methods is not very signifi- 
cant since some of the input features were measures made of linear combination of 
different objective algorithms [3]. However, the SOM provided a more reliable rele- 
vancy analysis of the input features and provided the bi-dimensional map. 



4 Relevance Analysis of the Dataset Features 

The contribution of the dimensions to the differentiation of the dysarthric groups can be 
explained with the use of SOM networks. This has been the main drawback of many 
non-linear analyses, such as those performed with some types of ANN, in which the 
relevancy of the observations is not properly understood. The SOM emerged from a 
vector quantization algorithm that places a number of reference codebooks into a high 
dimensional input data space which is an organized approximation of the dataset struc- 
ture. The self-organizing algorithm that shape this structure can be analyzed as a non- 
linear regression of the reference vectors though the data points [12]. Therefore, the 
node’s weight vectors corresponding to each group can provide information on the rele- 
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vancy of the input dimensions to the group. Each neuron will have a set of weighs char- 
acterizing each target group with a weight associated with each dimension. 

Based on the previous explanation, the neurons closer to the centroid of each group 
will have weights associated with similar characteristics to the mean values of each 
group. The magnitude and sign of these weights will provide a clinically valuable rele- 
vance indicator. 

Table 2 shows an example of the most relevant dimensions obtained for the classifi- 
cation of the FD group. The most clinically relevant dimensions found with the tradi- 
tional and lineal discriminant analyses are also shown for comparison. 



Table 2. Relevance analysis for the traditional method of dysarthria assessment (PA), linear 
discriminant analysis and self-organizing map. indicates coincidence in all studies 



ORDER 


Analysis 


1 


2 


3 


4 


5 


6 7 


8 


9 


10 11 


12 


13 


PA 


HN 


IC 


M 


AI 


SP 


ML BV 


NE 


HV 


R 


- 


- 


LDA 


HN 


DV 


RS 


ML 


IS 


lAB SP 


EES 


VR 


ELV VST 


R 


AI 


SOM 


HN* 


SP 


BV 


R* 


AI* 


ML* PI 


PB 


HV 


EES - 


- 


- 



The analysis shows, in agreement with the other methods, the dimension hypernasality 
as the most prominent feature in this group. This is in correspondence with reported 
studies based on physiological analysis of this type of disease [10]. Short phases (SP), 
breathy voice (BV), rate (R), audible inspirations (AI), monoloudness (ML), harsh voices 
(HV), prolonged intervals and excess of loudness variations are speech features also 
typical in this Dysarthric group. This analysis shows the dimension PB as relevant al- 
though it was not found relevant in the previous studies. However, PB is often heard in 
subjects with FD (i.e. Darley, Aronson & Brown listened PB in 5 of their 30 FD subjects 
[14]). 

The other methods, especially the FDA method, show features that are not commonly 
seen in this type of disease such as irregular articulatory breakdown (lAB), variable rate 
(VR) and imprecise consonant (IC). These methods also partially disagree with other 
physiological studies with respect to the relevancy order [10]. Similar relevancy analyses 
can be implemented on the rest of the dysarthric groups. In all cases the SOM method 
performed better than the others method studied demonstrating the feasibility of this 
technique to perform the relevancy analysis of the input features. 



5 Conclusions 

The results of the classification process reveal the convenience of using SOM for the 
assessment of dysarthria. A comparison with the implementation of the traditional and 
LDA classification methods shows that the SOM classifier outperformed the other two 
methods nearly by 5% and 20% respectively. The SOM also learned the topology of 
the data, producing a bidimensional map that provides more complete information 
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about the different dysarthric groups. The map obtained for the dysarthric database can 
provide information no only about the type of disease or the contribution of the obser- 
vation, but also about the evolution of the subjects after treatment. This is always an 
issue in providing objective testimonies of the disease progress. 

The SOM technique also bestows a more accurate relevancy analysis than the other 
methods studied acting as a non-linear regression algorithm. The relevancy analysis 
implemented in the form explained in Section 4 is simpler and easier to understand 
than other methods reviewed. This is important to ensure that a system based on this 
technique is used in regular practice by speech language pathologists. An assessment 
tool implemented with this classification technique can also avoid exposure to radia- 
tion or high magnetic fields in analysis commonly performed on subjects with these 
diseases. 
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Abstract. The nearest neighbour (NN) rule is widely used in pattern 
recognition tasks due to its simplicity and its good behaviour. Many fast 
NN search algorithms have been developed during last years. However, 
in some classihcation tasks an exact NN search is too slow, and a way 
to quicken the search is required. To face these tasks it is possible to use 
approximate NN search, which usually increases error rates but highly 
reduces search time. 

In this work we propose using approximate NN search with an algo- 
rithm suitable for general metric spaces, the Fukunaga and Narendra 
algorithm, and its application to chromosome recognition. Also, to 
compensate the increasing in error rates that approximate search 
produces, we propose to use a recently proposed framework to clas- 
sify using k neighbours that are not always the k nearest neighbours. 
This framework improves NN classihcation rates without extra time cost. 

Keywords: Approximate Nearest Neighbour, Pattern Recognition, 
Chromosome Recognition. 



1 Introduction 

The nearest neighbour (NN) rule classifies an unknown sample into the class of 
its nearest neighbour according to some similarity measure (a distance). Despite 
its simplicity, classification accuracy is usually enough for many tasks. However, 
some tasks may require finding the k nearest neighbours in order to improve 
classification rates, thus the NN rule has been generalized to the fc-NN rule [3]. 
Many classification tasks represent data as vectors and use one of the Minkowsky 
metrics as the distance, usually the L 2 (Euclidean distance). However, there are 
other tasks where a vector representation is not suitable, and thus other distance 
measures are used: string distance, tree distance, etc. 

Although heavily used in pattern recognition, the NN rules have been also of 
interest for other fields such as data mining and information retrieval, which usu- 
ally involves searching in very large databases and facing with high dimensional 
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data. Whenever the classification task requires large training sets or expensive 
distance measures, the simple exhaustive search for the NN becomes unprac- 
tical. To overcome some of these problems, a large number of fast NN search 
algorithms [5,4,13,11,2,10] have been developed; most of them have been eas- 
ily extended to find the fc-NN. However, the requirement of finding exactly the 
A:-NN involves higher computing effort (dependent on the value of k). 

For some tasks finding exactly the NN (even using a fast NN search algo- 
rithm) may become too slow; some approximate NN search algorithms [1] have 
been proposed to face these tasks, yielding slightly worse classification rates but 
obtaining much lower classification times. 

Recently [9], a framework for approximate fc-NN classification based on 
approximation-elimination fast NN search algorithms has been proposed. The 
main idea in that work is to modify a NN search algorithm keeping a sorted array 
with the prototypes whose distance to the sample has been computed during the 
search (the selected prototypes), and classify the sample by voting among the 
nearest k prototypes found while searching for the NN, including the NN itself. 
Those prototypes are called the k nearest selected prototypes (fc-NSN). 

In this work we have applied the ideas from [1] to the Fukunaga and Narendra 
algorithm, which has been implemented using a priority queue to allow approx- 
imate search. Then, to improve classification rates we propose to use either the 
fc-NSN classification scheme or the fc-NN scheme; while the first improves clas- 
sification rates without increasing classification times, the latter obtains better 
classification rates than fc-NSN but at an extra time cost that depends on the 
value of fc. 



2 The Fukunaga and Narendra Algorithm Implemented 
Using a Priority Queue 

The algorithm from Fukunaga and Narendra [5] is a classic NN search algorithm. 
In the preprocessing phase, a tree is built from the training set, using some hi- 
erarchical clustering algorithm. In [5] the fc-means algorithm is suggested for 
clustering the set at each level in the tree, and due to this suggestion the Fuku- 
naga and Narendra algorithm is often considered suitable only for Euclidean 
spaces. However, if a more general clustering algorithm is used instead, the 
search algorithm is suitable for any metric space. 

In the tree, each non-leaf node p contains a representative Mp of a set of 
prototypes Sp, a radius Rp (the maximum of the distances between Mp and all 
the other prototypes in Sp), and I children. Leaf nodes contain only a repre- 
sentative Mp and the set of prototypes Sp. The search phase traverses the tree 
using a branch and bound scheme. At each node, the distances from the repre- 
sentatives of its children to the sample are computed and stored. Given a child 
p, the pruning condition is: 



^nn 



( 1 ) 
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where x is the sample and dnn is the distance to the nearest neighbour found 
so far. For all non-pruned nodes, the search continues, starting with the nearest 
child. When the node p is a leaf, all the prototypes stored in the node are tested: 
if they can not be the nearest neighbour, they are pruned; otherwise, its distance 
to the sample is computed and the nearest neighbour is updated if necessary. 
Given a leaf node p, the pruning condition for a prototype Xi € Sp is: 



Please note that d{x,Mp) has been previously computed, and d{xi,Mp) is com- 
puted and stored during the building of the tree, so this condition does not 
involve new distance computations. 

The original formulation of the Fukunaga and Narendra algorithm [5] is usu- 
ally reformulated in a more intuitive recursive way, but in this work we have 
implemented it using a priority queue that allows for approximate search: after 
computing all the distances to the children, all non-pruned nodes are stored in a 
priority queue (similar to the one used in [1]), using d(x, Mp) — Rp as the key for 
the queue (see equation 1). Then, the closest element from the queue is extracted 
and compared (again) with dnn] if the current node key is greater than the 
search is finished as all the nodes in the queue are farther from the sample than 
the current nearest neighbour (see figure 1 for details). 

The Fukunaga and Narendra algorithm can be extended to find exactly the 
fc-NN with a couple of simple modifications: first, let dnn be the distance to 
the kth NN instead of the distance to the NN. Second, each time a distance 
is computed, store it in a sorted array of the fc-NN distances (if possible). As 
the value of dnn in the pruning condition changes, the time expended by the 
algorithm to find exactly the fc-NN increases in a quantity that depends on the 
value of k. 

3 Approximate Search and Classification 

The condition labelled as (a) in the figure 1 is the condition to finish the search: if 
the nearest (to the sample) element in the queue has a key m that is greater than 
the current distance to the nearest neighbour then the nodes in the queue 
(including the one who has just been extracted) can not contain the nearest 
neighbour and the search may be finished. 

Applying a technique similar to that in the work by S. Arya and D.M. 
Mount [1], the condition (a) in figure 1 may be transformed into: 



This new condition (with e > 0, obviously) allows to finish the search when 
the current nearest neighbour is not too far from the nearest neighbour. Using 
this new condition, the search will become faster, but the classification rate will 
become slightly worse. As it may be expected, the faster the search, the worse 
the classification rate will be, thus the choice of the value for e should be a 
trade-off between classification time and accuracy. 



^nn + d{xi, Mp) < d{x, Mp) 



( 2 ) 



if (1 -I- e)m > dnn or . . . 



( 3 ) 
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function pqsearch 
input t (tree) 

X (unknown sample) 

output nn G P {x’s nearest neighbour in P) 

begin 

insertPQ(t,0) / /insert the root of the tree in the queue 

endsearch := false \ B ~ oo 
while not endsearch do 

(t, m) := extractMinPQO / /extract node t with minimum key m 

(a ) if m > dnn or emptyPQO then 
endsearch := true 

else 

for all p = Child(t) do 

let Mp be the representative of p, and Rp the radius of p 
dp := d{x, Mp) 

if dp < dnn then / / Updating nearest neighbour 

dnn, • — dp , Tin . — p 

endif 

if dp < dnn + Rp then / / non-pruned child 

if Leaf(p) then 

for all prototype Xi G Sp do 
if dp < d{xi, Mp) + dnn then 
d^i := d(x, Xi) 

if dx. < dnn then / / Updating nearest neighbour 

dnn • — i nn . — Xi 

endif 

endif 

endfor 

else 

insert PQ(p, dp — Rp) 

endif 
endif 
endfor 
endif 
endwhile 
end pqsearch 



Fig. 1. Fukunaga and Narendra algorithm using a priority queue 



On the other hand, classification rates may be improved using more than just 
the nearest neighbour found in the search. If we use the fc-NN, the search will 
become slower, so we need a way to improve classification without increasing 
classification time. In [9] it is showed that storing the closest k prototypes whose 
distance to the sample is computed during a (non-approximate) NN search (the 
k nearest selected neighbours, the fc-NSN), and classifying the sample by voting 
among these prototypes improves significantly classification rates, yielding rates 
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similar to those of a fc-NN classifier with the classification time of a NN classifier 
(finding exactly the fc-NN requires an extra overhead). 

In this work we present some preliminary results of the application to the 
Fukunaga and Narendra algorithm of a combination of the two ideas above: ap- 
proximate search using e to improve speed, and approximate fc-NN classification 
(that is, fc-NSN classification) in order to improve classification rates (approx- 
imate NN search usually produces higher error rates). Two main changes have 
been made to the Fukunaga and Narendra’s algorithm: the use of a priority 
queue in the search to allow approximate NN search, and storing the k nearest 
prototypes visited during the search (the so called fc-NSN), in order to classify 
the sample by voting among them. 

4 Experiments 

We have developed a set of experiments with a chromosome database [8,7,6] that 
contains 4400 samples coded as strings. We have chosen to use the Levenshtein 
distance [12] to measure the distance between two chromosomes in this task. 
The database has been divided into two sets of 2200 samples each, and two 
experiments have been performed using one of them for training and the other 
one for test. The tree has been chosen to be a binary tree containing only one 
prototype at each leaf, and the fc-medians algorithm has been used to recursively 
partition the training set to build the tree. 

The experiments were repeated for several values of e and, in order to test 
the effect of using more than just one neighbour to classify, the fc-NSN and fc-NN 
schemes were used for classification; the values of k ranged from 1 to 15. Figure 2 
shows the evolution of both error rate and classification time of a 1-NN search 
for increasing values of e (1-NSN and 1-NN results are the same by definition). 
The results for fc = 15 are plotted in figure 3, which shows as a reference the 
1-NN error rate and classification time. 

As the figures 2 and 3 show, the choice of a value for e depends on the amount 
of allowable error increase, or on the amount of speed increase required. Also, 
using more than just one neighbour to classify improves error rates, thus allowing 
a higher value for e. If classification time is critical for the task, then the best 
choice seems to be the fc-NSN, which requires no extra time over a fc = 1 search 
and improves NN classification rates. However, using fc-NN produces lower error 
rates but with a certain time overhead. For the classification task presented in 
this work, the overhead is low due mainly to the low value of fc; higher values of 
fc have been tested but did not yield better classification results. 

5 Conclusions and Future Work 

We have combined two techniques to speed up the classification time and to 
improve classification rates, and we have tested that combination on a classic 
and widely known fast NN search algorithm, the Fukunaga and Narendra algo- 
rithm. The results show that the classification process using approximate search 
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Fig. 3. Comparison of error rates and classification times for several values of e, for 
k = 15. 



classification time (secs) % classification time (secs) 



328 



F. Moreno-Seco, L. Mico, and J. Oncina 



(with e > 0) is considerably faster, about four times faster for the chromosomes 
database. Also, the classification rates obtained may be improved using either 
A:-NSN or /c-NN classification schemes, which yield to rates always better than 
those of a non-approximate NN classifier, even with high values for e. 

As for the future we plan to apply the same techniques to tree-based NN 
search algorithms other than Fukunaga and Narendra’s. We will also study the 
relation between the value of e and the classification time and accuracy, using 
also other databases, either synthetic or real. 

Acknowledgments. The authors wish to thank Alfons Juan for providing us 
with the chromosomes database. 
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Abstract. We propose the use of Difference Histogram method for 
classification of soya seed speckle images. The time history speckle patterns 
(THSP), obtained from the dynamic speckle patterns of the seeds were 
processed for texture classification based on seed vigor and viability. In this 
work, bean seeds were analyzed with different humidity levels, ie. at 15, 25, 35 
and 45 minutes since the sample was taken out from the humid germination 
paper and submitted to the imaging process, with the aim of determining the 
influence of this temporal parameter in the classification result (dead or alive). 
The whole set of seeds was previously analyzed and classified in viable (alive) 
or not viable (dead) by experts in the matter by applying a traditional method. 
According to the obtained results the proposed method revealed to be 
appropriate for the task of classification of seeds. In the case of the highest 
humidity level the disagreement between our method and the conventional one 
was the greatest. It should be said that in this case the analyzed images were 
noisier. 



1 Introduction 

Dynamic speckle is a related phenomenon occurring when laser light is scattered by 
objects showing some type of activity [1]. This is the case of many biologic samples 
as seeds [2], fruits, etc., and some non biologic ones as corrosion phenomena and 
drying of paints [3]. 

In the activity images [3], corresponding to dynamic speckles, the size, shapes and 
spatial distribution of the areas of same gray tones, change with time. 
Characterization of seed quality can be done by taking into account vigor and viability 
of seeds through textural analysis of the time history of speckle patterns (THSP), 
obtained from the dynamic speckles [4]. In the THSP images, the rows represent 
different points on the object and the columns their intensity state in every sampled 
instant. The activity of the sample appears as intensity changes in the horizontal 
direction. So, when a phenomenon shows low activity (dead seeds), time variations of 



A. Sanfeliu and J. Ruiz-Shulcloper (Eds.): CIARP 2003, LNCS 2905, pp. 329-333, 2003. 
© Springer- Verlag Berlin Heidelberg 2003 



330 M. Fernandez et al. 



the speckle pattern are slow and the THSP shows elongated shapes. When the 
phenomenon is very active (alive seeds), the THSP resembles an ordinary granulated 
speckle diagram. Consequently, textural features provide information that allows the 
quantitative characterization of the seed state. 

Many methods for texture analysis have been reported [5-8], such as statistical and 
spectral ones using digital filtering [9-11]. We propose a methodology based on the 
Histogram Difference Method developed by Unser in 1986 [7] to characterize texture 
of the THSP images. The images were analyzed through the Unser discrimination 
function which was computed by using the elements of the difference histogram [7] as 
texture features. This is the histogram of the image obtained by subtracting a shifted 
image from the original image. 

In this work, bean seeds were analyzed with different humidity levels, ie. at 15, 25, 
35 and 45 minutes since the sample was taken out from the humid germination paper 
and submitted to the imaging process. It was done in order to study the influence of 
this temporal parameter in the classification process (dead or alive). The whole set of 
seeds was previously analyzed and classified in viable or not by experts in the matter 
by applying a traditional method. 

A similar procedure could be used to classify other kinds of images, corresponding 
to a dynamic speckle processes. 



2 Time-History Images of Dynamic Speckle Patterns (HTSP) 

In the optical experiment, dynamic speckle patterns corresponding to different seeds 
were registered. The samples were illuminated with an expanded and attenuated 10 
mW He-Ne laser. The speckle images were then registered by a CCD camera, 
digitized to 8 bits by a frame grabber and stored in the memory of a personal 
computer. 




Fig. 1. (a) Dynamic speckles corresponding to a seed "alive" (b) Dynamic speckles 
corresponding to a seed "dead" 

To recorded the time evolution of a speckle pattern we used the Oulamara et al. 
method [4]. It is, for every state of the phenomenon being assessed, 512 successive 
images were registered of the dynamical speckle pattern and a certain column was 
selected in each of them. With the selected column, a new 512x512 pixel^ composite 
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image was then constructed. In this image, named the time history of the speckle 
pattern (THSP), the rows represent different points on the object and the columns 
their intensity state in every sampled instant. 

The activity of the sample appears as intensity changes in the horizontal direction. 
So, when a phenomenon shows low activity (died seeds), time variations of the 
speckle pattern are slow and the THSP shows elongated shape. When the 
phenomenon is very active (alive seeds), the THSP resembles an ordinary (spatial) 
speckle pattern as shown in Fig. 1. 



3 Difference Histogram Method 

The difference histogram of the image Y[i,j] is the histogram of the auxiliary image, 
that is obtained by subtracting the original image of its replica, with a relative 
displacement of \dl, dZ] among them. 



’ •/' + ^2 ]- Y[i, j] 



( 1 ) 



The classification method is completed by applying the decision Bayesian rule and 
by assuming a multinomial distribution law for the histograms values x^{xi,Xn)[T\- 
The image whose histogram is x, belongs to the class " i " if: 



where: 



[/, (x)= mm\u j (x) \,j = 1, ,k 

^/(■^)=-Sii^/ log [Pji] 



( 2 ) 



(3) 



where P, , is the probability that the HD takes the value x, for the class “j These 
values are obtained from the HD patterns, which are representative of the k classes. 

In this experiment we have only one class: “dead”, and for this reason it was 
necessary to use the parameter Uj not to discriminate between some classes but to 
define a discrimination threshold between “dead” and “not dead” or “alive” textures. 
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4 Classification Procedure, Results 

Four groups of THSP images of seeds, corresponding to different levels of humidity 
were conformed. From each humidity group of 54 samples, a THSP image was 
picked up as the most representative of the “dead” state, according to its textural 
characteristics. This image texture was considered as the pattern corresponding to the 
“dead” class. 

It was selected an subset of 10 THSP images textures with heterogeneous character 
(five of them corresponding to seeds qualified as "alive" and the other five as "dead" 
by a conventional method). The values of the Unser discrimination function related to 
the corresponding pattern were computed for this subset. These values were used to 
define a discrimination threshold between “alive” and “dead” textures for each 
humidity group. The heterogeneous character of the sample subset contributes to fit 
the threshold in a more precise way. 

For classification process, the value of the discrimination function of each sample 
was compared to the threshold in order to classify the seeds in viable and not viable. 
The results were compared with those obtained by the conventional method. 
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Fig. 2. The figure 2 shows the classification results for two levels of humidity. The value of the 
discrimination function Uj related to the pattern is plotted for each THSP image. 



The straight line represents the threshold value of the discrimination function. For 
our method all results located under this line belong to the “dead” or non viable class. 
The crosses correspond to the sample subset that was considered to define the 
threshold value. The asterisks correspond to coincident classifications according to 
both methods: this method and the conventional one, and the circles correspond to 
non-coincident classifications. 

The percent of coincident classifications was 85%, 91%, 92% and 90% for times of 
drying of 15, 25, 35 and 45 minutes respectively. 
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5 Conclusions 

According to the obtained results the proposed method revealed to be appropriate for 
the task of classification of seeds. In the case of the highest humidity level the 
disagreement between our method and the conventional one was the greatest. It 
should be said that in this case the analyzed images were noisier. 

The characterization process of seeds done in this work, by using dynamic speckles 
and image processing, constitutes a less subjective method than the traditional ones 
and allows the automation of the process, increasing its efficiency. 

The proposed method could be applied, not only to classification of seeds but also 
to any other phenomenon where the mean speckle lifetime be a significant measure of 
some type of sample activity, making possible the characterization of the temporary 
evolution of the phenomena. 
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Abstract. Ventricular late potential detection can be used as a non-invasive di- 
agnostic tool, but traditional detection techniques need around 300 heartbeats 
and fail to obtain the beat-to-beat information. This paper combines a modified 
signal averaging and an adaptive enhancer to deal with non-stationary environ- 
ments and get beat-to-beat information from as little as 60 beats. In the ven- 
tricular late potential region of the recovered signal, discernible patterns indi- 
cate the presence or not of such waveforms. A maximum absolute value “aver- 
aging” can emphasize the boundaries of the QRS complex even further to suc- 
cessfully detect ventricular late potentials. 



1 Introduction 

Ventricular late potentials (VLPs) are low-amplitude, wideband-frequency waveforms 
that appear, in the last portion of the QRS complex or beginning of the ST segment, in 
the high-resolution electrocardiogram (see top plot in Fig. 1) of patients with some 
life-threatening cardiac diseases. Consequently, the detection of ventricular late po- 
tentials can be used as a non-invasive diagnostic marker. However, their detection 
constitutes a challenge because ventricular late potentials are masked by the other 
components of the electrocardiogram (ECG) and noise and interference, both in time 
and frequency domains. 

The most commonly used noise reduction strategy for ventricular late potential 
detection is the coherent averaging [1]. This needs around 300 heartbeats and may 
introduce a severe low pass filtering effect due to misalignments, not to mention that 
the beat-to-beat information is completely lost. A modified signal averaging technique 
[6] [7], which is a combination of mean and median filtering, outperforms the previ- 
ous way, although the problem of destruction of beat-to-beat information remains. 

The standard time-domain analysis employs a high-pass filter to enhance the ven- 
tricular late potentials, while attenuating the other components of the ECG and the 
noise and interference. To avoid ringing and to ensure that the onset and offset of the 
filtered QRS coincide with those in the original signal, a bi-directional four-pole But- 
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terworth high-pass recursive digital filtering is used [4]. This filter, however, cannot 
he applied in a single direction. In addition, it introduces distortion within the QRS 
complex. Recently, a finite impulse response filter design based on a parallel combi- 
nation of all-pass and binomial low-pass filters have been proposed to overcome these 
problems [6] [7]. 

A novel adaptive enhancer with modified signal averaging and maximum absolute 
value “averaging” was designed to obtain certain beat-to-beat information and to 
emphasize the boundaries of the QRS complex. This facilitates the recognition of 
patterns in the ventricular late potential (VLP) region, differentiating VLP and non- 
VLP subjects for diagnosis purposes. The new algorithm can handle certain non- 
stationary environments and provide beat-to-beat information. 



2 Adaptive Enhancer Plus Modified Signal Averaging for 
Beat-to-Beat Ventricular Late Potential Detection 

An alternative time domain analysis strategy for beat-to-beat ventricular late potential 
detection, based on adaptive line enhancing (ALE) plus modified signal averaging 
(MSA), was designed here. For ventricular late potential detection, ALE alone may 
not be good enough [3]. However, combining ALE and MSA, good results have been 
obtained with less than 64 beats, even for extreme noisy conditions [5]. Here, the 
initial ALE plus MSA prototype was analyzed and improved. 



2.1 Initial Adaptive Line Enhancing Plus Modified Signal Averaging Prototype 

Adaptive line enhancing followed by modified signal averaging was proposed for 
ventricular late potential detection in [5]. Using this approach, it was concluded that 
the acquisition time can be reduced five-fold to approximately one minute, while 
maintaining standards in noise reduction for ventricular late potential analysis. How- 
ever, in a more complete evaluation, increasing the number of real signals, some 
limitations of this prototype system became apparent. 

For some real high-resolution electrocardiographic signals, the system caused trou- 
blesome ringing inside the QRS due to instabilities. The concatenation of consecutive 
windowed heartbeats may introduce discontinuities on the adaptive line enhancing 
input, due to different levels of the PR and ST segments in the vicinity of the QRS 
complex. These discontinuities may cause instability in the algorithm, introducing 
ringing on the output signal. In addition, the same effect may appear associated to the 
abrupt transitions of the QRS. Consequently, a new adaptive enhancer was devised in 
an attempt to overcome these limitations. 



2.2 New Adaptive Enhancer with Modified Signal Averaging 

Fig. 1 shows the new adaptive enhancer with modified signal averaging. The original 
high-resolution electrocardiographic signal (top plot in Fig. 1) was high-pass filtered 
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(second plot on Fig. 1). This allows a better performance of the adaptive algorithms 
by diminishing the dynamic range of the input signals, attenuating the drift within the 
isoelectric segments and enhancing the signal-to-noise ratio in general. In addition, 
the baseline wandering almost disappears and the filtered segments PR and ST be- 
come leveled, avoiding any sharp transition in the further concatenation process. 

An enhanced version of the double-level QRS detector algorithm [6] was used to 
detect N fiducial marks (vertical lines in Fig. 1). Then, the filtered signal was win- 
dowed around the marks (100 ms to the left and 156 ms to the right). This can yield a 
256A^-element vector, by concatenation, and an A-by-256 matrix, x, where every row 
represents a windowed heartbeat, to perform the modified signal averaging [7]. 




Fig. 1. New adaptive enhancer with modified signal averaging 

In the new enhancer, the modified signal averaging [7] was computed to obtain a 
good quality reference for the adapter enhancer, by repeating N times its output vector 
y. The main mission of the new enhancer was not to enhance the signal but to track 
small changes, otherwise lost in the averaging process. This gives certain information 
on the ventricular late potential beat-to-beat variability that can help in the diagnosis. 

Given the matrix x, a matrix m (of size (2r/H-l)-by-256) is obtained by appending to 
it d delayed versions of x (each one-column delayed with respect to the previous one) 
and d advanced versions of x (each one-column advanced with respect to the previous 
one). Then, the modified signal averaging obtains the output y as 

{2d+l)N 

( 1 ) 

k=l 

where is the A:-th element (row) of the y-th column of the weighting matrix w with 
similar size as m, and d = 1 . The elements of the weighting matrix can be 0 or 1, de- 
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pending on the median of every column in matrix x (med(xj)) and the standard devia- 
tion of the background noise ((Tfso), 

Jl if \m, -med{x )\< 2a 

=\ ' ' ^ I . (2) 

[0 otherwise 

Due to the steep transition between the large QRS complex and low level ven- 
tricular late potentials, it was “conjectured” that filtering from the end of the data 
toward the beginning would give better results. Consequently, a time reversed or 
“flipped” filtering scheme was adopted. 

A flipping operation yields the main input (i) and the reference (r) for the adaptive 
enhancer. By flipping the concatenated-vectors over, the high-resolution electrocar- 
diogram sequence is processed in the backward direction. This processing from left to 
right achieved better stability (less ringing and distortion), and good tracking of the 
ventricular late potential changes. 

A second-order RLS adaptive scheme, with a forgetting factor of 0.95, was used to 
“enhance” the signal i with the reference r. The adaptive enhancer here does not re- 
duce noise in the isoelectric segment compared to the reference r, but it detects certain 
changes in the ventricular late potential segment. It should be mentioned, to be pre- 
cise, that the system cannot follow every change in i because of the limitations of the 
reference r, but it gives a good idea of the variability. 

The RLS algorithm allows a fast adaptation to any variation in the signal. By using 
a forgetting factor p of 0.95, the data in the distant past are forgotten [2] and the en- 
hancer can cope with a certain degree of non-stationarity. A filter of order two was 
found as a good compromise to provide an acceptable frequency separation (long 
enough), with a quick convergence and low computational load (short enough). It was 
found that the algorithm converged during the first windowed heartbeat. This enhan- 
cer can track not only amplitude variations, but also displacement or phase variations. 

The vector at the output of the adaptive system has to be flipped to recover the 
normal forward direction. This recovered vector can be written as an A-by-256 ma- 
trix, 02 , by taking every heartbeat as a different row. Finally, 02 can be used to obtain 
an “averaged” vector 03 by means of a maximum absolute value (MAV) operation. To 
avoid the influence of outliers, the samples of every column of 02 were sorted and the 
5% on the top and the 5% on the bottom were trimmed out before applying the MAV. 
The MAV operation selects, from every column of the matrix 02 after trimming, the 
sample whose absolute value is maximum to obtain the vector 03. 



3 Adaptive Enhancer Evaluation 

The adaptive enhancer implemented here includes two features that have to be tested. 
This scheme provides a matrix (output 02 ) including certain beat-to-beat information 
and, at the same time, yields an “averaged” vector (output 03) that can be used for an 
overall detection of the ventricular late potential, equivalent to the standard method. 

To evaluate the algorithms in very realistic scenarios, a high resolution electrocar- 
diographic (HRECG) database was created. This database includes the HRECG signal 
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from 59 post-myocardial infarction patients and 63 healthy volunteers with no evi- 
dence of cardiovascular disease. 5-minute records of the bipolar X, Y, and Z leads 
from each subject were simultaneously collected at a sample frequency of 1 kHz [6]. 
Furthermore, simulations of the HRECG signal, ventricular late potentials (fixed- VLP 
and variable-VLP) and noise were designed to evaluate these algorithms in a more 
controlled environment [6]. More than a thousand combinations were used for testing. 

The quality of a particular segment of the HRECG signal can be expressed in terms 
of several parameters. Some of the most important parameters used here to qualify a 
recovered sequence are the variance of noise Oe^, which represents the noise power, 
the bias be, and the signal-to-noise ratio SNR [6]. 



3.1 Beat-to-Beat Information 



Eig. 2 shows an example of how the novel adaptive enhancer plus modified signal 
averaging recovers the high-resolution electrocardiographic signal from a noisy envi- 
ronment. The 60-heartbeat records shown in the figure were previously high pass 
filtered and windowed around the fiducial marks (-100ms/H-156ms), to confine the 3- 
D plots. The time axis was reversed to see the ventricular late potential region (around 
150 - 200ms). 
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Fig. 2. Example of the performance of the adaptive enhancer with a noisy variable- VLP record 



The plot on the top shows a clean record with variable ventricular late potentials 
(cVLPji). Observe, however, that there is not a recognizable pattern in the region of 
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interest (VLP region, around 150 - 200ms) in the noisy signal nVLPji (central plot). 
The plot on the bottom represents the recovered signal by using the adaptive enhancer 
plus modified signal averaging (output 02 explained above). In the recovered signal, 
there is some distortion close to the steepest regions due to the adaptive algorithm. 
This distortion avoids following the exact ventricular late potential beat-to-beat 
structure, which is the main limitation of the adaptive scheme. However, the bottom 
plot in Fig. 2 clearly shows the pattern of variable ventricular late potentials, which 
can be distinguished even in a beat-to-beat basis. 

As expected, the performance of the algorithm is better for lower levels of noise. It 
is important to note that, for the same level of noise, the algorithm performs better 
with non- VLP and fixed-VLP than with the variable-VLP records, although some 
distortion may be present close to the peaks. Fig. 3 shows an example of that per- 
formance with a non- VLP noisy record. Observe the contrast between the enhanced 
signals (those at the bottom) in the Fig. 2 (variable-VLP) and Fig. 3 (non- VLP). 




Fig. 3. Example of the performance of the adaptive enhancer with a noisy non- VLP record 



3.2 Overall Detection of Ventricular Late Potential 

The rationale for the maximum absolute value at the end of the enhancing scheme was 
that the new modified signal averaging, used to generate the reference i for the adap- 
tive enhancer, worked well in the isoelectric (PR and ST) segments, providing good 
recovery of that part of the signal, but tended to attenuate the peaks. In the adaptation 
process, the higher instability is around those peaks because of the worse match be- 
tween the main input i and the reference input r, and because of the adaptation process 
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itself. The maximum absolute value block does not affect much the isoelectric seg- 
ment, but it “catches” the peaks (including those in the ventricular late potential re- 
gion) that are lost otherwise in the averaging process. In this case, the “averaged” 
vector 03 does not exhibit better noise reduction in the isoelectric segment than y, but 
a higher difference between the ventricular late potential and the isoelectric regions, 
more evident in the presence of variable ventricular late potentials. 

Fig. 4 shows the absolute value of the “averaged” 60-heartbeat filtered signal by 
using the standard method and the adaptive enhancer plus maximum absolute value, 
i.e. output 03, compared to the first filtered heartbeat of the ideally clean signal. It can 
be noticed that the standard averaging method works acceptably well in the isoelectric 
segments, but attenuates considerably the variable ventricular late potentials (VLP 
region), making difficult to distinguish the end of the QRS (offset). However, the 
adaptive enhancing plus maximum absolute value, although introduces some distor- 
tion, intensifies the differences between the isoelectric segment and the ventricular 
late potential region, making easier the recognition of the offset. 



Idenlly rleiiii (l.st beat) 

Standard aveiaguig 
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Fig. 4. Absolute value of the “averaged” filtered signals compared to one ideally clean beat 



When the ventricular late potentials are fixed (beat-to-beat repeatable), the distor- 
tion introduced by the algorithm can be discarded. For these cases, the offset is again 
easily distinguishable, providing a good discrimination between the VLP and the non- 
VLP subjects. 

A detailed study with Imin test records, showed a perfect classification for the en- 
hancing plus maximum absolute value scheme by using the duration of the QRS as a 
discriminant feature. 
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In the previous sections, a limited set of representative figures illustrate the results 
of the more exhaustive evaluation. However, it was verified qualitative (simple in- 
spection) and quantitatively (computing ( 5 ^, be, and SNR) that the algorithms here 
presented consistently outperform those reported in [1] and [5]. 



4 Conclusions 

The number of heartbeats needed for the processing algorithms here designed was 
decreased to less than 60 (i.e. approximately 5 times less than for the standards.) By 
reducing the acquisition time, the high-resolution electrocardiographic signal is less 
likely to exhibit non-stationary behavior; nevertheless, the algorithms implemented 
here have certain capability to handle non-stationary data (forgetting factor < 1 and 
modified signal averaging which rejects outliers). The new adaptive enhancer plus 
modified signal averaging provides beat-to-beat information, and different patterns 
are associated to the VLP region for VLP and non-VLP subjects. 

Although some other tests have to be performed before definitively introducing 
these algorithms to the clinic application, the results so far show a great improvement 
in the sensitivity and specificity. The processing techniques assessed, outperformed 
the classical time-domain analysis method. Improved processing algorithms to detect 
and analyze ventricular late potentials allow for better diagnosis capabilities. There- 
fore, the results of this work can have a direct impact on the lives of many individuals. 
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Abstract. In this paper we present the methodologies and experiments followed 
for the implementation of a system used for the automatic recognition and clas- 
sification of patterns of infant cry. We show the different stages through which 
the system is trained to identify normal and hypo acoustic (deaf) cry. The cry 
patterns are represented hy acoustic features obtained by the Mel-Erequency 
Cepstrum and Lineal Prediction Coding techniques. Eor the classification we 
used a feed-forward neural network. Results from the different methodologies 
and experiments are shown, as well as the best results obtained up to the mo- 
ment, which are up to 96.9% of accuracy. 



1 Introduction 

The infant crying is a communication way, although more limited, it is similar to 
adult’s speech. Through crying, the baby shows his or her physical and psychological 
state. Based on human and animal studies, it is known that the cry is related to the 
neuropsychological status of the infant [1]. According to the specialists, the crying 
wave carries useful information, as to determine the physical and psychological state 
of the baby, as well as to detect possible physical pathologies, from very early stages. 
In previous works on the acoustical analysis of baby crying, it has been shown that 
there exist significant differences among the several types of crying, like healthy, pain 
and pathological infant cry. Using classification methodologies based on Self- 
Organizing Maps, Cano [2] attempted to classify cry units from normal and pathologi- 
cal infants. In another study, Petroni used Neural Networks [3] to differentiate be- 
tween pain and no-pain crying. Previously, in the seminal work done by Wasz-Hockert 
spectral analysis was used to identify several types of crying [4]. In a recent investiga- 
tion, Taco Ekkel [5] attempted to expand a set of useful sound characteristics, and 
find a robust way of classifying these features. The goal of Ekkel was to classify neo- 
nate crying sound into categories called normal or abnormal (hypoxia). However, up 
to this moment, there is not a concrete and effective automatic technique, on baby 
crying, useful for clinical and diagnosis purposes. 
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2 Infant Cry 

Crying is the only communication mean that the bahy has in the first months of life, 
before the use of signs or words. The Crying wave is generated in the Central Nervous 
System, that’s why the cry is thought to reflect the neuropsychological integrity of the 
infant, and may be useful in the early detection of the infants at risk for adverse devel- 
opmental outcome. In this work, two kinds of crying are considered: normal and 
pathological (hypo acoustical) crying. The Automatic Infant Cry Recognition process 
(Fig. 1) is basically a problem of pattern processing. The goal is to take the crying 
wave as the input pattern, and finally obtain the type of cry or pathology detected in 
the baby. First, we have to take a sample set, apply acoustical analysis and principal 
component analysis to get a reduced vector, which is used to train the recognizer. 
Second, we take a test sample set, and also apply acoustical analysis and principal 
component analysis to reduce the vector’s dimension. Then the reduced unknown 
vector is passed by the pattern classifier, which, at the end, classifies the crying sam- 
ple. 

In the acoustical analysis, the crying signal is analyzed to extract the more important 
features in time domain. Some of the more usual simple techniques for signal proc- 
essing are: Linear Prediction Coding, Cepstral Coefficients, Pitch, Intensity, among 
others. The extracted features from each sample are kept in a vector, and each vector 
represents a pattern. 




Typs of Cry 



Detected 

Patholaisy 



Fig. 1. Automatic Infant Cry Recognition Process 



Infant cry shows significant differences between the several kinds of crying, which 
can be perceptually distinguished by a trained person. The general acoustical features 
for normal crying show, raising-falling pitch pattern, ascending-descending melody, 
high intensity as shown in Fig. 2. Pathological crying (Fig. 3) shows acoustical char- 
acteristics like: intensity lower than normal, rapid pitch shifts, generally glottal plo- 
sives, weak phonations and silences during the crying. 
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Fig. 2. Waveform and spectrogram 
of normal crying 



Fig. 3. Waveform and spectrogram 
of pathological crying 



3 Mel Frequency Cepstral Coefficients 

The digitized sound signal contains irrelevant information and requires large amounts 
of storage space. To simplify the subsequent processing of the signal, useful features 
must be extracted and the data compressed. The power spectrum of the speech signal 
is the most often used method of encoding. Mel Frequency Cepstral coefficients 
(MFCCs) [6] are used to encode the speech signal. Cepstral analysis calculates the 
inverse Fourier transform of the logarithm of the power spectrum of the speech signal. 
For each utterance, the Cepstral coefficients are calculated for all samples with suc- 
cessive frames. The energy values in 20 overlapping Mel spaced frequency bands are 
calculated. This results in each frame being represented by 16 and 21 MFCCs. 



4 Linear Prediction Coefficients 

The objective of the application of these techniques is to describe the signal in terms 
of its fundamental components. Linear Prediction (LP) analysis has been one of the 
time domain analysis techniques more used during the last years. LP analysis attempts 
to predict "as well as possible" a speech sample through a linear combination of sev- 
eral previous signal samples. Thus, the spectral envelope can be efficiently repre- 
sented by a small number of parameters, in this cases LP coefficients. As the order of 
the LP model increases, more details of the power spectrum of the signal can be ap- 
proximated. 



5 Neural Networks 

Neural Networks are one of the more used methodologies for classification and pat- 
terns recognition. Among the more utilized neural network models, there are the feed- 
forward networks which use some version of the back-propagation training method. In 
general, a neural network is a set of nodes and a set of links. The nodes correspond to 
neurons and the links represent the connections and the data flow among neurons. 
Connections are quantified by weights, which are dynamically adjusted during train- 
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ing. The required training can be done through the back-propagation technique. During 
training (or learning), a set of training instances is given. Each training instance is 
typically described by a feature vector (called an input vector). It should be associated 
with a desired output (a concept, a class), which is encoded as another vector, called 
the desired output vector. In our study, several methods were tested to train the feed- 
forward neural networks. 

5.1 Training with Scaled Conjugate Gradient Method 

After analyzing the performance of the algorithms [13], we chose the one with high 
classification accuracy and low training time. Under these conditions we selected SCG 
to continue with our experiments. From an optimization point of view, learning in a 
neural network is equivalent to minimizing a global error function, which is a multi- 
variate function that depends on the weights in the network. Many of the training 
algorithms are based on the gradient descent algorithm. SCG belongs to the class of 
Conjugate Gradient Methods, which show super-linear convergence on most prob- 
lems. By using a step size scaling mechanism SCG avoids a time consuming line- 
search per learning iteration, which makes the algorithm faster than other second order 
algorithms. And also we got better results than when using other training methods and 
neural networks tested, as standard back-propagation and cascade neural network. 



6 Training Process and Experimentation 

We made two kinds of experiments, one with Linear Prediction Coefficients (LPCs) 
and the other with Mel-Frequency Cepstral Coefficients (MFCCs). The selection of 
samples for training and testing was done at random. Training stops when the maxi- 
mum number of epochs is reached, or when the maximum quantity of time has been 
exceeded, or when the performance error has been minimized. To be sure the per- 
formance is at an acceptable level, in terms of accuracy and efficiency, we used the 
10-fold cross validation technique [10]. The sample set was randomly divided into 10 
disjoint subsets, each time leaving one subset out for testing and the others for train- 
ing. After each training and testing process, the classification scores were collected, 
and a new complete process started. In this way, we performed 10 different classifica- 
tion cycles, until all data subsets were used once for testing. All the experiments are 
done without ever using the same training data for testing. Once the 10 experiments 
were done, the overall scores were calculated from the average of all the individual 
ones. For the MFCCs and LPCs analysis, the samples were segmented in windows of 
50 ms and 100 ms for different experiments. We extracted 16 and 21 MFCCs per 
window. Depending on coefficients number and window length, we got different pa- 
rameters number for each sample. For example, with 16 coefficients for a window 
length of 50ms for a one second sample, the features vector contains 320 parameters, 
corresponding to 320 data inputs to the neural network. In this situation, the dimension 
of the input vector is large, but the components of the vectors are highly correlated 
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(redundant). It is useful in this situation to reduce the dimension of the input vectors. 
An effective procedure for performing this operation is the Principal Component 
Analysis (PCA). After several tests, we got good results with 50 parameters by each 
vector or pattern [13]. 



7 Data Set 

A set of 116 samples have been directly recorded from 53 babies by pediatricians, 
with digital ICD-67 Sony digital recorders, and then sampled at 8000 Hertz. The same 
pediatricians, at the end of each recorded sample, do the labeling. The collection of 
pathological samples is done by a group of doctors specialized in communication 
disorders, from babies already diagnosed as deaf by them. The babies selected for 
recording are from just born up to 6 month old, regardless of gender. The corpus col- 
lection is still in its initial stage, and will continue for a while. 116 crying records, 
from both categories, were segmented in signals of one second length. 1036 seg- 
mented samples were obtained, 157 of them belong to normal cry, and 879 to patho- 
logical cry. For the reported experiment, we took the same number of samples for 
each class, 157. 



8 System Implementation 

For the acoustic processing of the cry waves, we used Praat 4.0.2 [11] to obtain the 
LPCs and MFCCs. To perform pattern recognition, a 50 input nodes - 15 hidden layer 
nodes - 2 output nodes, feed-forward network was developed for training and testing 
with LPC and MFCC samples. The number of nodes in the hidden layer was heuristi- 
cally established. The implementation of the neural network and the training methods 
were done with the Neural Networks Tool Box of Matlab 6.0.0.88 [12]. The same 
Matlab version was used to implement the PCA algorithm. 



9 Experimental Results 

To establish the adequate number of principal components, we made an analysis on 
the information each number preserves. The Fig. 4 shows the preserved information 
by principal components from 1 to 928. For example, the 10 first components keep 
90.89%, while the 30 first components keep 93.08% from the original features. The 
original LPC features vector was reduced to different number of principal compo- 
nents. Their performance was evaluated by measuring the precision of the classifier 
with vectors containing between 10 to 110 principal components (Fig. 5). As can be 
observed, up to 50 components, the recognition accuracy increases as the number of 
principal components also increases. From 50 components on, the precision slightly 
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decreases (Fig. 5). Based on this analysis we selected 50 to be the size of the input 
vector. 

100 %, , . . , 
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Fig. 4. Preserved information to different number of principal components 

We did not analyzed more than 110 components because our goal was to reduce the 
original features vector to a manageable size. These results were obtained with the 
SCG network configured by 50, 15 and 2 nodes, in the input, hidden, and output lay- 
ers, respectively. 



Classification with different number of principal 
components using an ANN 




Num. de Componentes Principales 



Fig. 5. Accuracy achieved by using different number of principal components. 



9.1 Training and Classification Results 

The neural networks were trained to classify the cries into normal and pathological 
classes. For training with 10-fold cross validation technique, the sample set is divided 
into 10 subsets, 4 groups with 32 samples and 6 groups with 31 samples. Each time 



348 



J. Orozco-Garcia and C.A. Reyes-Garcia 



leaving one set for testing and the remaining for training. This process is repeated until 
all sets have been used once for testing. The classification accuracy was calculated by 
taking the number of correctly classified samples by the network, and divided by the 
total number of samples into the test data set. Some of the best results obtained for 
both types of features, LPC and MFCC are shown in the following confusion matrices 
in Table 1 and Table 2 respectively. In both cases, the results were produced by the 
net from a 50 principal components input vector. The reduced vectors come from the 
original feature vectors, which are one second length samples divided in 21 coeffi- 
cients per 100 ms window. Table 1 shows the results obtained with LPC features, and 
Table 2 shows the corresponding to the MFCC features. 



Table 1. Infant Cry classification for LP Coefficients 



Type of Cry 


#of 

Samples 


Confusion Matrix 


Classification 


Nor- 

mal 


Deaf 


Normal 


157 


150 


7 




Deaf 


157 


11 


146 


Total 


314 




94.3 % 



Table 2. Infant Cry classification for MFC Coefficients. 



Type of Cry 


#of 

Samples 


Confusion Matrix 


Classification 


Normal 


Deaf 


Normal 


157 


149 


8 




Deaf 


157 


2 


155 


Total 


314 




96.80 % 



10 Conclusion and Future Work 

This work has shown that the results obtained when using the MFCCs features are 
better than LPCs features in the test. Besides observing the neural network’s perform- 
ance, we have gathered useful acoustical information on the infant cry. We hope this 
information could be helpful to pediatricians and doctors in general. As can be noticed 
,in the confusion matrices, still many samples form one class can be confused as be- 
longing to the other. We are working to explain why that happens, in order to avoid 
the problem and to improve the classification accuracy. At this moment we are start- 
ing new experiments to also identify asphyxia in new born babies, by their crying. As 
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future work we consider to collect enough samples to train the classifiers appropriately 
and to have some other classes to classify. We still intent to experiment with mixed 
features, as well as with hybrid intelligent classification models. Moreover, we will 
intent to identify the degree of deafness. 
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Abstract. Knowing a patient’s risk at the moment of admission to a medical unit 
is important for both clinical and administrative decision making: it is funda- 
mental to carry out a health technology assessment. In this paper, we propose a 
non-supervised learning method based on cluster analysis and genetic algorithms 
to classify patients according to their admission risk. This proposal includes an 
innovative way to incorporate the information contained in the diagnostic hypo- 
theses into the classification system. To assess this method, we used retrospective 
data of 294 patients (50 dead) admitted to two Adult Intensive Care Units (ICU) in 
the city of Santiago, Chile. An area calculation under the ROC curve was used to 
verify the accuracy of this classification. The results show that, with the proposed 
methodology, it is possible to obtain an ROC curve with a 0.946 area, whereas 
with the APACHE II system it is possible to obtain only a 0.786 area. 



1 Introduction 

In order to determine the admission risk, it is necessary to know the patient’s condition 
at the moment he or she is admitted to a medical unit. This condition represents very 
valuable information to make clinical decisions such as the admission of the patient 
to an Intensive Care Unit (ICU) and helps decide the distribution of resources within 
the unit [1-2]. Besides, it is crucial to carry out a health technology assessment [3]. 
The problem that exists when determining the admission risk is that, after making the 
preliminary assessment, the patient is treated, thus modifying his initial condition. A 
true quantification of this risk could be achieved by assessing the results of the natural 
evolution of the illness without applying medical technologies, which would be an ethical 
impossibility. 

In medicine, the traditional way to deal with this problem has been the creation of 
physiological indexes intended to determine the seriousness of the patient’s condition 
at the moment of admission, such as: Simplified Acute Physiology Score (SAPS) [4] 
or Acute Physiology and Chronic Health Evaluation (APACHE [5]). Among the main 
disadvantages of these indexes, we may quote the lack of an adequate characterization 
of the medical information at the moment of admittance, such as a quantification of co- 
morbidities [6], the linear characteristics of the composition of these indexes and the lack 

* This study has been supported by FONDECYT (Chile) project No. 1990920 and DICYT- 
USACH No. 02-021 9-0 ICHP 
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of precision to determine each individual patient’s risk. This last point makes possible 
that these indexes permit to carry out global comparisons between units (they offer 
accurate risk values on the average) but they are not of great help to assess technologies 
within a medical unit. 

At present, there exists a number of publications that intend to forecast the outcome 
of medical procedures. To this effect they have applied different Data Mining methods 
such as: Logistic Regression, Cluster Analysis, Neural Networks and Bayesian Networks 
[2,7-9]. Especially, the work carried out by Pena-Rojas and Sipper [8] contains an 
extensive revision of genetic algorithm applications in various fields of medicine. Of 
the aforementioned works, the one that is closest to admission risk estimate is the one 
published by Dubowski et al [7], which analyses the outcome considering the deaths that 
take place during admittance to the hospital. In spite of the fact that the prediction of the 
outcome is useful when making some decisions - such as the admission of patients- these 
predictions do not represent the condition of the patient at the moment of admission and 
the use of these results to make a technological assessment in health is questionable 
since it includes the utilization of the same technology it intends to assess [10]. 

This work’s proposal is based on two fundamental points: the first one consists 
of using a group of variables similar to those used in physiological indexes, such as 
APACHE [5]. It uses an original way of quantifying diagnostic hypotheses (including co- 
morbidities) as a way to incorporate the medical knowledge that exists at the moment of 
admission. The second point consists of using a non-supervised learning method, which 
takes as a basis the clustering of k-means, together with a strong search method like the 
genetic algorithms. This type of learning allows the formation of groups of patients, based 
exclusively on the information we have about each patient at the moment of admission. 
Later on we use the information contained in the outcome of the interventions, just to 
assign risk to each group. It is always possible to do this when accepting the hypothesis 
that interventions tend to modify the patient’s state positively. In this way, the groups that 
show a greater number of patients who did not survive, in spite of the treatment, represent 
the groups of higher initial risk. Thus it will be possible, later on, to classify a new patient 
depending exclusively on the conditions presented at the moment of admission. 

In order to assess the proposed methodology, we have used the information taken 
from 294 medical records of patients who were admitted in two Intensive Care Units 
(ICU) for adults. These were classified in accordance with afore described methodo- 
logy. The results were compared using the classification that resulted from applying the 
APACHE 11 [11] index by means of the use of ROC [12] curves. 



2 Data Collection and Pre-processing 

The data were obtained from the Intensive Care Units (ICU) of two public hospitals 
of the city of Santiago, Chile. A total of 294 medical records of adult patients were 
collected. 50 out of the 294 correspond to patients who died during their stay at ICU. 
Table 1 shows a list of the variables that were taken at the moment of admission or during 
the first 12 hours of hospitalization. 

The discreet or binary variables, such as infection upon admission, can be represented 
directly, but the variables used to represent physiological acuteness require a monotonous 



352 



M. Chacon and O. Luci 



severity scale in order to be used efficiently with the clustering method. These variables 
do not have a monotonous behavior with respect to seriousness. For example, in its 
original form, temperature has a normal severity range located at the middle of the scale, 
but both hypothermia and fever produce an increase in the patient’s severity condition. 
To capture the severity increase in only one direction, we resort to the code used by the 
APACHE system, which uses discrete ranges between zero and four. In this way, severity 
always increases from zero value, which is considered normal. To encode age we also 
use the five ranges defined in the APACHE system. 

The method used to quantify the information contained in the diagnostic hypotheses 
was the knowledge of intensivists with the double purpose of enriching the information 
contained in the diagnoses and carrying out a quantification that may reflect severity. 
The procedure included four steps: Eirst, the 780 diagnoses (each patient considers the 
principal diagnosis and co-morbid states up to a maximum of eight) were split into 17 
groups that represent physiological systems and morbid groups (Table 2). 



Table 1. List of input variables for the grou- Table 2. Quantification of the Diagnostic 

ping method. Hypothesis 



Variable 


Units 


Age * 


In years 


Sex 


Binary 


Glasgow Coma Scale 


0-10 


Admittance Infection 


Binary 


Admittance with Respiratory 


Binary 


Cardio arrest 




Pre-Admittance Surgery 


Binary 


Temperature* (axilar) 


"C 


Mean iirterial pressure * 


mmHg 


Heart rate * 


Breaths per min 


Respiratory rate * 


Breaths per min 


Pa02-A02 • (Arterial Oxygen 


mmHg 


Pressure DilTerence) 




pH* (Arterial) 


pH (Arterial) 


Serum Sodium * (Sodium ion 


Meq/1 


in the blood) 




Serum Potassium * (Potassium 


-Meti/I 


ion in the blood) 




Scrum Creatinin* (It measures 


mgidi 


renal function) 




Hematocrit • (It measures 


% 


globular mass) 




UTiite Blood Count * 


Cel /cm ’ 


Neoplasia Presence 


Binary 


Multiorganic Failure 


Binary 



* Variables were codified according to APACHE index. 



Physiological System or 
Morbid Group 

Neurological 

Respiratory 

Cardiovascular 

Renal 

Metabolic 

Digestive 

Endocrine 

Immunologic 

Obstetric-Gynecologist 

Osteomyoarticular 

Psychiatric 

Uro-gynecological 

Transplant 

Otorhinological 

Hematological 

Detmatologic 



Then, the specialists classified each diagnosis in three different categories: chronic, 
acute and hyperacute (except in the cases of trauma, that are all acute or hyperacute). A 
third step considered that in each physiological or morbid group there may exist more 
than one of these categories or there may exist a combination of these, for instance two 
chronic and one hyperacute diagnoses. With these combinations, a classification system 
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was created that had an associate increasing severity order according to the combination 
of chronic, acute and hyperacute in each patient. Finally, an algorithm was used to assign 
severity condition values to each one of the 17 groups for each patient, according to the 
combination of existing diagnoses. 



3 K-Means and Genetic Algorithms 

The combination proposed for the clustering rescues the main strengths of the traditional 
clustering methods, especially those of the k-means [13] and Genetic Algorithms (GA) 
[14]. From the k-means, we can save the simplicity in the representation for cluster 
creation, which has an influence in the capacity to look for different alternatives at the 
same time, thus increasing the solution space and its capacity to avoid being trapped in 
local minimals. From a general perspective, the first problem to be solved is to determine 
the optimum number of clusters. We based the solution to this problem on a measure 
of quality for the clustering. The existing literature on cluster analyses shows that one 
of the most efficient measurements to determine the quality of the clustering is that of 
Calinski-Harabasz [15]. We leave as a parameter the cluster range. Clustering is started 
with the largest number of clusters. After that, the quality of the clustering is measured 
and then the closest clusters are identified and merged. This process continues until the 
lower bound of the selected clusters interval is reached. Finally, the number of clusters 
that present the highest value for the Calinski-Harabasz measurement is selected. 

For the particular case of admission risk, the selection of the clusters ranking made 
by the user becomes easy since a large number of clusters (over 10) does not contribute 
any significant advantages in the final classification and it is difficult to make a later 
identification of the risk of each cluster by means of the assignation of the death rate 
that is obtained from the results of the medical interventions. 



3.1 Representation and Initialisation 

One of the crucial aspects in the application of GA and meta-heuristic methods in general 
is the nature of each particular problem. In the case of GA, the problem is to determine 
which elements of the problem will be represented in each individual’s chromosome. 
After assessing different proposals found in the existing literature for the application 
of GA to clustering [16-17], we chose to represent - in each chromosome - the set 
of centroids calculated in each iteration. In this way, the GA will be responsible for 
running the solution space in a higher hierarchical level and the method component, 
corresponding to the k-means, will be responsible for forming new cluster, assigning the 
individuals to the nearest centroid. 

A random cluster creation was used for the initialization process, making sure that 
all clusters will have at least one individual. Later on, their centroids, which are used 
to create chromosomes, were calculated. This initialization process is done for all the 
individuals of the population. 
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3.2 Selection, Crossing, and Mutation 

The selection was made using the proportional or roulette method [17], which is the 
method that is most commonly used when designing GA. To assign each cluster (or 
chromosome) its area in the roulette, the measurement of clustering quality called sum 
of squares [17] was used. The proportional method allows to choose, with higher pro- 
bability, those individuals that have been better evaluated, but it does not discard those 
who have low qualification. Pairs of chromosomes are gathered for the crossover, the 
same individual may be selected more than once in subsequent crossovers. In this stage 
all that matters is to verify that the selected pairs correspond to different chromosomes. 

To carry out the crossing, it is necessary to obtain the pair of individuals from the 
selection stage and to consult the crossing probability, which was previously assigned 
as an initial parameter of the program. If the indication is no-crossing, the selected 
chromosomes are copied to the new generation. On the contrary, if the indication is 
crossing, the crossing operator is applied and the offspring are assigned in the new 
population. The crossing operator works taking a random number of centroids from 
both parents in order to transmit them to the offspring. The total number of centroids 
must correspond to the number of clusters that are being looked for at that stage. 

The representation of centroids that have been chosen facilitates this crossing opera- 
tion because, in order to guarantee the creation of valid individuals it is only necessary 
to make sure that there are not repeated centroids. In case of repetition, a new centroid is 
chosen. Besides the aforementioned this representation does not require to manipulate 
the individual data for the crossing. It only requires to assign data to the nearest centroid 
and then recalculate the centroids to form the new generation. 

The experimental tests showed that the centroid recalculation is a fundamental stage 
of this method because it improves substantially the results and the speed with which 
they are obtained. This recalculation may be interpreted as a local optimization that 
improves the quality of individuals before they are passed on to the next generation. To 
carry out mutation, we must consider the mutation probability that is established as an 
initial parameter of the program. In this way the values of a centroid that is chosen at 
random are aleatorily altered. When modifying some of the centroid’s values, this moves 
through the solutions giving the opportunity of the production of new genetic material. 



4 Results 

The procedure followed for the cluster formation is centred in the adjustment of parame- 
ters to obtain the maximum values in the Calinski-Harabasz quality scale. After trying 
different parameter combinations, the quality measure reached a maximum of 48.4799 
when forming four groups with a population of 20 individuals and crossing and mutation 
probabilities of 0.75 and 0.05 respectively. 

Once the clusters have been formed, the problem is to create an ordinal risk scale 
for the groups. To this effect, in each group the proportion of dead and survivors was 
identified (death rate). If we accept that the use of technology avoids death, it is possible to 
affirm that the groups with the highest death rate correspond to the groups that presented 
a higher admission risk. 
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Figure 1 shows the groups in decreasing order according to the risk, with the dead 
number in each group. It is interesting to mention that when the number of groups 
increases or diminishes not only does the value of the quality measurement of the cluste- 
ring decreases but also the difficulty to differentiate the risk based on death proportions 
increases. 



GA4Groups 





Fig. 1. Grouping obtained by the Genetic Al- Fig. 2. Distribution of the dead in the groups 
gorithm (GA) for 4 groups. CH=48,4799. formed by APACHE II. 



It is important to assess what has been achieved with the clustering. To this effect 
we can resort to the APACHE method, which is the most commonly used method to 
determine risk at the moment of admission to the ICU. In the specific case of the units 
that have been mentioned, all the variables that conform the APACHE II index are 
registered. Using these variables it is possible to estimate the risk of each patient, using 
a logistic transformation that assigns a probability to the weighed score obtained from 
the APACHE II variables. 

To make an equivalent comparison we formed four groups, gathering the individual 
risks in a decreasing order. Figure 2 shows the clustering made according to APACHE 
II index, indicating the number of dead and survivors in each cluster. 

It is also interesting to compare the efficiency of the proposed GA method in relation 
to other traditional clustering methods. To carry out this comparison, a clustering was 
made using the k-means method assigning risk in the same manner as in the case of GA. 

The presentation of results based on histograms is useful to identify the group risk, but 
does not offer an objective comparative parameter between the different methods. Given 
the fact that the comparison is based on the classification between dead and survivors, it is 
possible to measure the sensitivity and specificity to detect death as if each method were 
a binary death rate classifier, with the discrimination threshold taken from each of the 
generated clusters. Each binary classification generates a point (sensitivity, specificity). 
Using these, it is possible to draw an ROC curve and, in this way, the area below the 
ROC curve will be an objective comparative parameter. Eigure 3 shows the ROC curves 
that measure the advantage of the death classification of APACHE II index (area equals 
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0.786); the k-means method (with a 0.888 area) and the proposed method that uses GA 
(with a 0.946 area). 



ROC Curves 
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Fig. 3. ROC curves for GA, K-Means and APACHE. 



As mentioned in the introduction, an important point in this design is the incorpo- 
ration of information that contains the diagnosis hypotheses and therefore it would be 
interesting to assess the contribution of this information. In order to carry out this asses- 
sment, the 17 variables that represented the diagnosis hypotheses were withdrawn and 
the proposed method was applied again. The area below the ROC curve gave a 0.824 
value. 



5 Discussion 

The histogram that appears in Figure 2 (APACHE II) shows the groups in a decreasing 
order according to their seriousness. However, in spite of this, the dead shows a more 
homogeneous distribution among the groups than the proposed model (Figure 1). This 
proves the difficulty that is found with the APACHE system to classify the patients’ 
individual risks correctly (despite the fact that in this case the patients are grouped 
together). For this reason the APACHE system has been criticized because, although 
everybody recognizes its effectiveness to determine the number of patients that may die 
in a unit, it fails in the individual classification [18-19]. This limits its field to comparative 
studies between units and it does not offer guarantees when it is necessary to know the 
admission risk of a specific patient. But this is exactly what is needed to carry out an 
effectiveness analysis of the technologies that are used in a unit. 

A more precise quantification of the differences between the GA grouping method 
and the APACHE system can be made comparing the ROC curve areas. Knowing that 
the perfect discriminator has an area that is equal to that of the unit, the GA grouping 
reaches 0.946, exceeding the ROC curve area reached by the APACHE II system by 
0.161. 
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The influence of incorporating the GA to the method can be assessed when comparing 
the classification obtained through GA with the k-means classification. In this case we 
can see that the GA classification surpasses the curve obtained by k-means by 0.058. 
This is an indication that the sole idea of using simple grouping method produces an 
increase of 0.103 in the ROC curve area with respect to the classification obtained by 
the APACHE II system (36% of the gain is attributed to the GA and the remaining 64% 
corresponds to the grouping in an isolated way). 

The experiment of deleting the variables that quantify diagnosis hypotheses shows 
that the inclusion of this information produces a decrease of 0.122 of the area, which 
corresponds to 75.8% of the gain obtained with the total of the variables proposed with 
respect to APACHE II. 

From this analysis we can infer that the basic idea of using a non- supervised method 
such as clustering produces improvement with respect to the standard risk evaluation 
systems in medicine. But the highest values are achieved when we incorporate adequately 
the information contained in the diagnosis hypotheses and the use of GA to achieve a 
better clustering of patients by the risk. 



6 Conclusions 

Starting from the idea of using clustering analysis to build classifiers that may allow 
to classify patients according to their admission risk, this paper includes the diagnosis 
hypothesis quantification and the GA as the two main pillars that proved fundamental to 
achieve the results shown here. 

We must point out the simplicity and capacity to be reproduced offered by diagnosis 
quantification, provided that we can count on the specialists to carry out the diagnosis 
classification. The main problem with this classification is the splitting of diagnoses into 
two degrees of acuteness: acute and hyperacute. This problem can be overcome through 
consensus meetings with groups of specialists. 

From the application of GA to the clustering analysis, it is worth pointing out the 
simplicity of the representation and the crossing stage since the traditional applications 
for this problem present complex crossing operators [16-17], with high probability to 
create crippled descendants and loss of genetic material along the generations. The 
use of centroids in the chromosome formation allows to preserve more adequately the 
ancestors’ best genetic material, circumscribing the random component exclusively to 
mutation. 

A second stage in this research would be to increase the number of patients, trying 
to include all the existing pathologies in one ICU; to implement a classification system 
to carry out a prospective study that classifies the patients at the moment of admission 
and, later on, to assess the results achieved in the unit operation. 

The extension of this method to other complex treatment systems such as Neonatal 
Intensive Care Units, Coronary Units or Multiple Trauma Units offers great possibilities 
since in these cases the pathologies are sometimes more restricted and it is possible 
to state the quantification of the diagnosis hypotheses even more accurately. But cer- 
tainly the richest field for its application are the studies on Technology Assessment in 



358 



M. Chacon and O. Luci 



Health Care, where the admission risk assessment is fundamental to achieve technology 
effectiveness. 
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Abstract. Intermuscular Fat Content and its distribution during the ripening 
process of the Iberian ham is a relevant task from the point of view of techno- 
logical interest. This paper attempts to study the Iherian ham during the ripening 
process with images obtained from a MRI (Magnetic Resonance Imaging) de- 
vice using Pattern Recognition and Image Analysis algorithms, in particular 
Mathematical Morphology techniques. The main advantage of this method is the 
non-destructive nature. A concrete algorithm is proposed, which is based on the 
Watershed transformation. In addition, the results are compared with the Otsu 
thresholding algorithm. The decreases of the total volume in the ripening proc- 
ess are shown. Also the decrease of the meat percentage and intermuscular fat 
content are calculated. As a conclusion, the viability of these techniques is 
proved for the possible future utilization in the meat industries to discover new 
characteristics in the ripening process. 



1 Introduction 

Image classification by segmentation is a very important aspect of the Computer Vi- 
sion techniques. It is being applied in the field of Food Technology to determine some 
features of this kind of images, as for example the quantification and distribution of 
intermuscular fat content in the Iberian ham. In our study, these results will be com- 
puted attempting to improve the industrial process, since fat content and its distribu- 
tion influences water loss and salt diffusion during the ripening process [1]. 

Iberian ham images have been processed in this research in order to find out some 
characteristics and reach conclusions about this excellent product. The Iberian pig is a 
native animal bred from the south-western area of Spain, and dry-cured ham from 
Iberian pig is a meat product with a high sensorial quality and first-rate consumer 
acceptance in our country. The ripening of Iberian ham is a long process (normally 18- 
24 months). Physical-chemical and sensorial methods are required to evaluate the 
different parameters in relation with quality, being generally tedious, destructive and 
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expensive [1]. Traditionally, the maturation time is fixed, when the weight loss of the 
ham is approximately 30% [2]. So, other methodologies have long been awaited by the 
Iberian ham industries. 

The use of image processing to analyze Iberian products is quite recent. Some re- 
searches have processed flat images taken by a CCD camera from Iberian ham slices 
for different purposes [3, 4, 5]. They estimated some parameters in Iberian ham like 
intramuscular fat content [5] and marbling [3] or classified various types of raw Ibe- 
rian ham [4]. The obtained results were very encouraged and suggestive to its applica- 
tion for the systematic inspection of Iberian products. However, although Computer 
Vision is essentially a non-destroying technique, ham pieces must be destroyed to 
obtain images using these techniques. 

MRI (Magnetic Resonance Imaging) offers great capabilities to non-invasively look 
inside the bodies. It is widely used in medical diagnosis and surgery. It provides mul- 
tiple planes (digital images) of the body or piece. Its application to the Food Technol- 
ogy is still recent and it is confined for researching purposes. Cernadas et al. [6, 7, 8] 
analyze MRI images of raw and cured Iberian loin to classify genetic varieties of Ibe- 
rian pigs and to predict the intramuscular fat content [9]. The results are promising to 
its application to ham [10]. The loin is an uniform and simple muscle, and this is a 
very important advantage, comparing with the great number and complex distribution 
of muscles of the ham, being this one a significant drawback. 

The image segmentation can be realized via the Mathematical Morphology meth- 
ods. These techniques detect object structures or forms in images. A special method of 
segmentation is the Watershed Transformation, which is based on Mathematical Mor- 
phology [11, 12, 13]. In our paper, the Watershed Segmentation will be applied to 
characterize the quantification and distribution of the fat content in Iberian hams. The 
results of this last technique will be tested by a well-known segmentation method 
based on Otsu thresholding [14]. A comparative study between these two techniques is 
realized to analyze three regions in images (background, intermuscular fat content and 
meat). 

This method is applied over a database of specific MR images from Food Technol- 
ogy, particularly Iberian ham images obtained at three different maturation stages 
(raw, semi-cured and dry-cured Iberian ham). Mathematical Morphology is used to 
achieve the quantification and distribution of intermuscular fat content in the total 
Iberian ham, studying its volume changes during the ripening process of the Iberian 
ham. The verification of the presented approach is shown examining these changes, 
and the obtained practical results may allow us to optimize time, temperature and salt 
content during ripening of Iberian ham. 



2 Data Set 

The presented research is based on MRI sequences of Iberian ham images. Its appli- 
cation to Food Technology is still recent and it is confined for researching purposes. 
Four Iberian hams have been scanned, in three stages during their ripening time. The 
images have been acquired using an MRI scan facilitated by the "Infanta Cristina" 
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Hospital in Badajoz (Spain). The MRI volume data set is obtained from sequences of 
T1 images with a FOV (field-of view) of 120x85 mm and a slice thickness of 2 mm, 
i.e. a voxel resolution of 0.23x0.20x2 mm. The total number of images in the obtained 
database is 252. 



3 Classification Methods 

Watershed transformation is used to classify different tissues in images of Iberian 
hams (based on regions), in comparison with thresholding segmentation Otsu method 
(based on pixels). 



3.1 Thresolding; Otsu Method 



Otsu method is a widely used thresholding segmentation technique, which can be 
employed for classification of tissues. Concretely, Otsu and multi-level Otsu methods 

2 2 2 

minimize the weighted sum of group variances: let (7^ , (Jg and (Tj be the within- 



class, between-class variance, and the total variance, respectively [14]. An optimal 
threshold, t, can be determined by maximizing one of the following criterion functions 
with respect to t: 



A = - 



< 7 „ 



ri = - 









The Otsu segmentation method in Iberian ham for intramuscular fat content has 
been perfectly evaluated, in comparison with the chemical analyses that are realized 
habitually on this kind of meat [15]. 



3.2 Mathematical Morphology: Watershed Segmentation 

The Watershed transformation segments images into watershed regions [10]. Consid- 
ering the gray scale image as a surface, each local minimum can be thought of as the 
point to which water falling on the surrounding drainage regions (figure 1). 

Noise and small unimportant fluctuations in the original image can produce spuri- 
ous minima, which leads to oversegmentation. Smoothing the original image is an 
approach to overcoming this problem. 

We have used the Watershed Transformation to classify three different regions in 
Iberian ham images from MRI (background, meat and fat). 

Figure 2 shows images from the proposed algorithm. Different regions can be ap- 
preciated in the Watershed images, and the classification in three categories. 

The algorithm works over our database with original images (figure 2. A shows 
one these images). This image is filtered by a Morphological Closing algorithm, using 
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as structural element a disk with a radio of 5 pixels (figure 2.B). A median filter is 
used with a window size of 5x5 pixels. Next step consists on using a Sobel operator in 
order to detect boundaries, in which Watershed regions could be placed. The output 
image is converted to a binary image (bilevel image) and a median filter is applied 
again. All these steps (median filter, Sobel operator, binarization and median filter 
again) are necessary to avoid the oversegmentation of the regions with the Watershed 
Transformation. 



Watershed lines 




Fig. 1. Intuitive definition of the Watershed Transformation (Watershed regions: 1 and 2; Wa- 
tershed lines: A and B). 



When the image is correctly pre-processed the Watershed Transformation is ap- 
plied, obtaining different interesting images. One of them is the figure 2.C, which 
shows Watershed regions. They are classified with the average gray level of the same 
regions over the original image. Another interesting image is the figure 2.D, with the 
original image mixed with the Watershed regions. Figure 2.E illustrates the final 
multi-level Otsu classification in three gray levels. Finally, the figure 2.F is the classi- 
fication of the Watershed regions. They have been obtained by the proposed algo- 
rithm, which can be easily compared in the original image (2. A) and in the Otsu classi- 
fication (2.E). The numerical results as well as the comparison of both methods are 
shown in the following sections. 
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E) F) 



Fig. 2. Example images. A) Original image. B) Morphological closing. C) Watershed 
regions with the average of gray levels for each area. D) Binarization of the image with 
the watershed regions. E) Classification in three regions by thresholding with the Otsu 
method. F) Classification in three regions by the Watershed transformation. 
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4 Results 

Four bar charts show the obtained results with the two tested algorithms. Three stages 
have been considered (MR images acquired) in the ripening process: raw, semi-dry 
and dry-cured ham. 

The total volume decrease in the ripening process of the Iberian ham can he ob- 
served in the figure 3. The total absolute reduction is shown in figure 3. A, with the 
average of the four considered hams of this study. The comparative studies between 
both techniques (Watershed and Otsu) are quite similar. For example, the first bar of 
the graph shows values near 80.000 pixels in both cases. The percentage reduction has 
been 15-20% in average, from the initial to the final stage (figure 3.B). 



Total size 



% Size reduction 




Maturation stage Maturation stage 



□ Watershed nOtsu 



□ Watershed DOtsu 



(A) (B) 

Fig. 3. Total size reduction, in absolute (A) and relative (B) values. 

Figure 4 shows the percentage size of the fat (4. A) and meat (4.B) content as an 
average of the four hams (relative quantity in relation to the number of pixels in the 
ham area). The results are similar in the Watershed and Otsu segmentation as well. 

In addition, as it was waited, the total size of the ham decreases; however, the 
relative quantity of fat content keeps more or less constant. 



5 Discussion and Conclusions 

The results obtained using both computer vision techniques are quite similar to the 
quantities calculated by the Food Technology specialists. They have estimated the 
total weight decrease at 30% in the Iberian ham during the same time. The proposed 
algorithm obtains a reduction of the ham size of about 15-20% approximately, in all 
the processed images. Therefore, a relationship between the ham weight decrease 
(30%) and total volume reduction (15%) could be established for the maturation time, 
as a first approximation. This size reduction could be caused by the loss of water dur- 
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ing the maturation process. Optimal ripening time could not be the same for different 
Iberian pig hams. By studying the percentage rate of volume during the process, it 
could be possible to predict the optimal ripening moment. 



% Size (Fat content) 



% Size (Meat) 




Raw Semi-dry Dry-cured 

Maturation stage 







Raw Semi-dry Dry-cured 

Maturation stage 



□ Watershed nOtsu 



□ Watershed □Otsu 



(A) (B) 

Fig. 4. Fat (A) and meat content (B), in percentage over the ham volume. 

Discussing about both techniques, Otsu method could be considered a good way 
to evaluate the fat level, but it does not offer information about the distribution of fat, 
i.e. it is only a good quantification method. The classification process is realized pixel 
by pixel. Wrong pixel classifications could be achieved, for example when noisy pix- 
els appears on the image. 

On the other hand. Watershed transform classifies full regions into the selected 
class. This method catalogs the whole region (all the pixels of the watershed region) 
into the adequate category. Therefore, not only quantitative information is achieved by 
this method, but also its distribution inside of the image, which is quite important too. 

The main aim of this paper is the quantification and distribution of the intermus- 
cular and subcutaneous fat content. The selected method to this classification is Wa- 
tershed Transformation, due to the fact that it not only computes the quantity of fat, 
but in addition the regions of fat content are perfectly located. 
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Abstract. Is proposed a method for the automatic detection of dark 
fibres in wool tops based on image processing. A software which 
implements this method was developed, composed by five modules: 
KL projection, light correction, Gabor filtering, segmentation and 
morphology. The digital image are taken by a camera placed in a 
balanced illumination system. The method was calibrated and tested 
on 170 images marked by experts from Secretariado Uruguayo de la Lana. 

Keywords: Dark Fibres Detection, Wool Industry, Balanced Illumina- 
tion, Light Correction, Gabor Filtering 



1 Introduction 

1.1 Context 

One of the problems of the wool industry is the presence of impurities in its raw 
material. One kind of this impurities are dark fibres coming from urine stain, 
black spots or other animal defects. These fibres remind coloured after the dying 
process, and appears as not desired dark lines, when tops destination is soft 
colour tissue production. Wool depreciates 15% [1] if it contains more than a 
specified number of dark fibres (DF) per Kg. The wool is collected from the 
sheep by shear, the fleece is then washed and combed. At this level the product 
is called wool top and is the input for the spinning industry. 

In order to improve wool quality, is necessary to control the DF number 
as early as possible in the industrial process, but the difficulty of the problem 
is lower at the end of the industrial chain because the fibres are clean and 
parallelized. It exists a standardized method to manually count the DF per Kg 
on the top [2], in use at the Secretariado Uruguayo de la Lana (SUL). 

In this article is presented an automatic solution for the counting module 
of DF in wool tops, integrating image processing techniques and the associated 
acquisition system. 

1.2 Specification and Image Data Bases 

Wool fibres are categorized by its coloration level in a 0 to 8 scale. The CSIRO 
standard [2] [3] defines as dark fibre those with level greater than 4 with the 
additional condition of being longer than 10mm [2]. 
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Fibres diameter are between 20 and 30/xm. Working with a 1536x2048 reso- 
lution camera, a 18x24mm field of view was selected. A collection of 170 images 
of this size was acquired combining Merino and Corriedale races top samples. A 
SUL expert indicates the presence of DF in the images and gives its position and 
colour level. The data base formed by this images was divided in two groups, 
the A base, with 70 images for the system calibration and the B base reserved 
to validation. A new partition was made in de A base to form the sub-bases Al 
and A2 with 20 and 50 images respectively. 

1.3 Balanced Illumination System 

To acquire the images the digital camera was placed on a controllable illumina- 
tion system. The wool sample is placed between upper and below light sources. 
The light intensity is balanced to cancel white fibres shadows, vanish the back- 
ground and enhance the presence of the DF [3] [4]. Figure 1 shows the effect 
of this system comparing the same image acquired with wrong and balanced 
illumination. 




Fig. 1. Balanced illumination effect 



2 Processes and Algorithms 

2.1 Noise Reduction 

The image is filtered by a median filter in order to reduce the image noise 
preserving borders [5]. 

2.2 Projection 

The source image obtained from the balanced illumination system is RGB. To 
keep the software module compatible to a system implemented with an available 
high resolution B&W industrial camera, a projection was done at this point. In 
order to preserve as much DF information as possible in this process, a direction 
that statistically approximates the direction of the Karhunen Loeuve transform 
primary component was selected. 

After finding this direction for the set of images in base Al, no relevant 
variation in its coordinates were found for other images. Therefore a statisti- 
cally mean direction obtained in a training period was used to project the other 
images. This approximated direction was called AKL . 
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Fig. 2. Blocks diagram 

2.3 Non Uniform Illumination Compensation 

It was observed a non uniform illumination in the acquired images. The centre 
of the images are more illuminated than the border. Its effect is shown on figure 
3 as athe surface representation. 




Fig. 3. Left: Surface representation of the acquired image. Rigth: image after non 
uniform illumination correction 



To solve this problem a compensation step was introduced. A multiplicative 
model (I{x,y) = V{x,y)J-{x,y)) was supposed, where I is the original image, 
T represents the uniformly illuminated image and V is & parabolic surface with 
the maximum near the image centre. In order to estimate V, a mean squares 
method is used with samples of X as input data. Knowing X and P, is possible 
to correct the acquired image using the multiplicative model. 

2.4 Gabor Filters 

The aim of this process is to extract from the image as much information as 
possible about DF presence. Gabor Filters [6] are used in order to enhance the 
dark fibres in a textured white background. The bank of filters are tuned for a 
maximum response in presence of the outlying fibre depending on its direction 
and diameter. 

Impulse response of this filters are deduced by scale transformation and rota- 
tions from the Gabor function (1) formed by a complex exponential modulated 
by a gaussian. 
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The basic frequency of the filters are represented by fh- The impulse response 
of the Gabor filters are given by 



fpq{x,y) = a Pf(x ,y ) 



( 2 ) 



X = a ^(xcos{9q) + ysen(9q)) y = a ^(—xsen{9q) + ycos(9q)) 

Where a~^ is the scale factor and 9q the selected direction. Each filter acts 
as a directive bandpass filter with parameters p and q. 

To design the filters, 3 central frequencies and 2 directions (vertical and 
horizontal) was chosen. The highest frequency was placed in fh = 1/8. Following 
[7], with S = 3 and L = 4, were calculated the remaining filter parameters a, 
cTj; ,<7y and the lower and medium frequencies // and /„, getting the following 
parameters: 



a = 2 a^ = 4.497 ay = 1.038 



1 

16 



fi 



1 

32 



To evaluate the filter orientations, the formule 9q 

g= 1...4. 



L 



was used, with 




Fig. 4. Behaviour of two filters at the same frequency but in vertical and horizontal 
directions. 



A FIR implementation was used for the filter realization. The used kernels are 
elongated, with axis size calculated in order to let place two times the Gaussian 
deviations in each direction. 

The convolution kernel is applied to a subset of the input image formed by 
bidirectional subsampling in a 4 to 1 ratio. 

Notice that the Gabor filtering separates the clusters corresponding to DF 
and background on the histogram of the image. Figure 5 shows the histogram of 
the expert marked pixels on the image before and after the Gabor filtering with 
an horizontal kernel. 
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Fig. 5. Histogram of the expert marked pixels on the image before (left) and after 
(right) the Gabor filtering with an horizontal kernel. 



2.5 Thresholding 

The Gabor filter step generates 6 images: 2 orthogonal directions and 3 frequency 
bands. Each one is thresholded using a linear model in order to binarize the 
images after the Gabor filtering. 

Ui = Qi^jLi + hai + Ci i = 1. ..6 (3) 

The parameters of the linear model are estimated in a learning process using 
the A1 image set. On this set an expert marked, for each Gabor filter response, 
a set of pixels corresponding to DF and another similar sized set for the back- 
ground cluster. These sets carry the necessary information to estimate an optimal 
threshold for these responses, as explained in next section. 

Parameters a, b and c of equation 3, are adjusted by a least squared method 
using the optimal thresholds estimated with the A1 data set. 

Optimal threshold. Let be J a statistical error cost function formed by prior 
cluster probabilities Pp and Pb (DF and background respectively), the classifi- 
cation error probabilities a (classify a DF pixel as background) and (3 (classify 
a background pixel as DF), and the assigned error costs (C(a) and C{(3))\ 

J = P^C{a)a + P2C{13)13 (4) 

The optimal threshold will be the minimum of J function. A Gaussian model 
for clusters distribution has been used. 

The minimum of J function solves the next equation 

{ap - a%)x'^ + 2{a%^lF - crF^J'B)x + n%ap - - 2{aF<JB)^ — ) = 0 (5) 



being 



, _ VbC( 0) 
^ ~ VpC{a) ■ 



Once the Gaussian parameters ^p, ^b, crp and us are extracted from the 
training test, optimal threshold is deduced from the equation 5. Ghanging the 
A parameter allows the system to varies its sensibility, penalizing false positives 
or false negatives. 
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Fig. 6. 



2.6 Dark Fibres Segmentation 

Next step is to combine the 6 binarized images produced by the Gabor filter and 
threshold steps, into one bynary output image with DF pixels marked. 

First, the 6 binarized images are added pixel a pixel. On that new image, a 
directive hysteresis was used in order to select the pixels belonging to DF. 

Directional thresholding hysteresis is done using two thresholds Thigh and 
Tiow Pixels with values surpassing the Thigh are settled as DF starting points. 
A convenient direction is browsed for each starting point, and a walking process 
start adding pixels in this direction as the Tiow test is passed. 

At each step, the process allows direction to vary 30° and if this test is not 
passed another intent is accomplished doubling the walking step. 
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Thresholds was settled to Thigh = 5 and Tiow = [0.5,3] according to the 
system sensitivity chosen. 

2.7 Morphologic Operations 

A labelling process [5] have been done on the resulting image, marking the regions 
which are associated to DF. The local and global angles, and the diameter are 
calculated for each region. This parameters are used to perform morphologic 
operations [13] to join regions belonging to the same dark fibres, and to eliminate 
small noisy regions. 



3 Results 



The image set B, formed by 100 images, were marked by an expert using the 
standardized manual procedure and processed by the automatic system. Accord- 
ing to the SUL expert 87 of these 100 images has at least one DF. The total 
number of DF in the set is 118. 

Table 1 presents the automatic success rates, categorized in fibres coloration 
levels, as the comparison between expert sentences and system sentences, ap- 
plying to the same image. This table contains rates obtained placing A = 2 and 
Tiow = 1-5. 



Table 1. 



fibre kind 


fibres detected by expert 


system success 


success rate 


95% confidence interval 


5 


16 


15 


94% 


(77%, 100%) 


6 


29 


26 


90% 


(76%, 98%) 


7 


42 


42 


100% 


(98%, 100%) 


8 


31 


31 


100% 


(97%, 100%) 



Added to the previous table, the system made two false negatives in the 100 
images set. 

Figure 6 illustrates some results. The first column shows the image after the 
non uniform illumination step. For each image, the second column shows the DF 
marked by the expert. The DF colour category is detailed in a colour scale as 
follows: red: 5, yellow: 6, green: 7 and cyan: 8. The third column shows the dark 
fibres detected by the automatic system. 
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Abstract. A Musical Style Identification model based on Grammatical Infer- 
ence (GI) is presented. Under this model, regular grammars are used for mod- 
eling Musical Style. Style Classification can be used to implement or improve 
content based retrieval in multimedia databases, musicology or music educa- 
tion. In this work, several GI Techniques are used to learn, from examples of 
melodies, a stochastic grammar for each of three different musical styles. Then, 
each of the learned grammars provides a confidence value of a composition 
belonging to that grammar, which can he used to classify test melodies. A very 
important issue in this case is the use of a proper music coding scheme, so dif- 
ferent coding schemes are presented and compared, achieving a 3 % classifica- 
tion error rate. 



1 Introduction 

Grammatical Inference (GI) aims at learning models of languages from examples of 
sentences of these languages. Sentences can be any structured composition of primi- 
tive elements or symbols, though the most common type of composition is the con- 
catenation. From this point of view, GI find applications in all those many areas in 
which the objects or processes of interest can be adequately represented as strings of 
symbols. Perhaps the most conventional application areas are Syntactic Pattern Rec- 
ognition (SPR) and Language Modeling. But there are many other areas in which GI 
can lead to interesting applications. One of these areas is Musical Style Identification 
(MSI). Here, the very notion of language explicitly holds, where primitive symbols or 
“notes” are adequate descriptions of the acoustic space, and the concatenation of these 
symbols leads to strings that represent musical sentences. By adequately concatenat- 
ing symbols of a given musical system, a musical event emerges. However, not any 
possible concatenation can be considered a “proper” event. Certain rules dictate what 
can or can not be considered an appropriate concatenation, leading to the concept of 
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musical style. Main features of a musical style are rhythm and melody, which are di- 
rectly related with the rules used to concatenate duration and pitch of sounds, respec- 
tively. 

The interest in modeling musical style resides in the use of these models to gener- 
ate (Automatic Composition) and classify music hy style (MSI), which are our areas 
of interest. The MSI area is being recently explored, mostly in the field of multimedia 
databases, trying to improve content based retrieval in multimedia databases, allowing 
indexing by musical style in addition to other suitable indexes. But other applications 
can be musicology (finding authors for anonymous pieces) or music education. Some 
AI techniques that have been employed are Hidden Markov Models [11], Self- 
Organising Maps [12] and Neural Networks [17]. This paper is focused in our MSI 
work [4] [5] that have been extended by adding new coding schemes for music. 



2 Grammatical Inference 

In this section we first quickly review the field of GI and later we will explain briefly 
the techniques used in our study. Grammatical Inference is a well established disci- 
pline originated by the work of Gold about “Language Identification in the Limit” 
[8]. Perhaps the most traditional applicative field of GI has been Syntactic Pattern 
Recognition; but there are many other potential applications. In general, GI aims of 
finding the rules of a grammar G, which describes an unknown language, by means of 
a positive sample set c { a | a 6 L(G) } and a negative sample set (of samples that 
should be rejected) R . c ( p | PeX’ - L(G) }. A more limited framework, but more 
usual in practice, is the use of only positive samples. The main problem lies in finding 
a more “abstract” or general grammar G’ so that R^ a L(G’) and L(G’) - R^ (the ex- 
tra-language) contains only strings with “similar features” to the strings from R^. A 
grammar G is said to be identified in the limit [8] if, for sufficiently large sample sets, 
L(G’)=L(G). Grammars and Formal Languages can be seen from a probabilistic 
viewpoint too, in which different rules have different probability of being used in the 
derivation of strings. The corresponding extension of GI is called Stochastic Gram- 
matical Inference and this is the way of work of all the techniques used in our study. 
Next, we concisely explain three of the techniques used in our study to infer the 
grammars employed for composing and identifying musical styles. Other techniques 
employed with less success are Regular Positive Negative Inference (RPNI) [10] and 
a State Merging Technique based on probabilistic criteria called ALERGIA [3]. These 
techniques are fairly well known in the GI community and have been proven useful in 
other fields. 

The Error-Correcting Grammatical Inference (ECGI) technique is a GI heuristic 
that was explicitly designed to capture relevant regularities of concatenation and 
length exhibited by substructures of unidimensional patterns. It was proposed in [14] 
and relies on error-correcting parsing to build up a stochastic regular grammar 
through a single incremental pass over a positive training set. This is achieved 
through a Viterbi-like, error-correcting parsing procedure [6] which also yields the 
corresponding optimal sequence of both non-error and error rules used in the parse. 
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Similarly, the parsing results are also used to update frequency counts from which 
probabilities of both non-error and error rules are estimated. One of the most used 
models in Natural Language Processing are N-Grams (a class of Markov models) [9]. 
An N-Gram is a sequence of symbols of length N. The first N-1 of these are the con- 
text. The size of N can in theory be anything from 1 upwards. However, certain val- 
ues are better than others at capturing the characteristics of the language. The larger 
the value of N, the more context is captured. Though it would seem useful to have a 
great N, it is not a case of the larger the better. As N grows, it captures more context. 
Eventually the sequences learned become not just characteristic of the corpus, but the 
exact sequences in the corpus. The parameter estimation method can be consulted in 
[9]. In our study, modeling with N-Grams is performed using the CMU-Cambridge 
Statistical Language Modeling Toolkit (SLM-Toolkit) [13]. The k-TSI technique in- 
fers k-Te stable Languages in the Strict Sense (k-TSSL) in the limit. It has been dem- 
onstrated that they are equivalent to N-Grams with N=k [15]. The main difference 
between them is that it is generally assumed that an N-Gram embodies the (N-i)- 
Grams [i=l..N], while a k-TSSL model consists only in the model of order k. The in- 
ference of k-TSSLs was discussed in [7] where the k-TSI technique was proposed. 

As each inferred automaton or N-Gram model represents a musical style, “recog- 
nizing” the style of a test melody consists in finding the automaton which best recog- 
nizes this melody. This can be best achieved by using an algorithm that performs sto- 
chastic Error-Correcting Syntactic Analysis through an extension of the Viterbi 
algorithm [6]. The probabilities of error rules (Insertion / Deletion / Substitution) can 
be estimated from data [1]. The Analysis Algorithm returns the probability that the 
analyzed melody is (error-correcting) generated by the automaton. By analyzing the 
same melody with different automata, we classified it as belonging to the musical 
style (language) represented by the automaton that gave the largest probability. 



3 How to Code Music for Syntactic Pattern Recognition 

GI, as a Syntactic Pattern Recognition (SPR) technique, works with symbol strings. 
Even though only duration and pitch of sounds (as main features of music) are used in 
this work, the way they are represented implies the inclusion of more or less musical 
information. The amount and/or meaning of this musical information can be key in 
for the success in Musical Style Recognition (MSI). So, we have to deal with the se- 
lection of pitch and duration representations and the coding into symbol strings. Many 
efforts have been done within the Computer Music research community in musical 
representation systems and it is not clear that one system is always better than the oth- 
ers [2] [16] [18], being very dependant on the application and the recognition para- 
digm. In next subsections, we present the pitch and duration representations that, in 
our opinion, are more suitable for a SPR technique as GI, as well as some encoding 
examples into symbol strings. Brief comments will be done for every representation, 
according to its performance for MSI and Automatic Composition (AC) with GI. 
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3.1 Pitch Representation 

Absolute. Pitch is most often represented either by the traditional pitch naming sys- 
tem (e.g. F#4-G#4-A4) or as absolute pitch (e.g. in MIDI: 66, 68, 69). It is the same 
representation as in musical scores, but may be insufficient for applications in tonal 
music. The main problem is that transpositions are not accounted for (e.g. repeating 
the same pitch motive transposed in different samples or within the same one). 

Relative. A solution for the transposition problem is the use of the relative pitch be- 
tween notes. That is, the interval (number of semitones) between two notes. There 
exists a somewhat ‘peculiar’ relationship between pitch strings and pitch interval 
strings. If one pitch interval in a string of pitch intervals is altered then all the suc- 
ceeding notes are altered (transposed) [2]. So a change in a string of pitches and in a 
string of pitch intervals is not exactly the same thing. This effect appeared in our AC 
experiments [4]. 

Melodic Contour. Pitch interval encodings readily lend themselves to the construc- 
tion of a number of more abstract representations of musical strings such as contour 
strings. Intervals can be categorised in a number of classes according to the signs of 
intervals. Instead of taking the absolute or relative pitch, it is coded if the next pitch 
goes “up” (U), “down” (D) or it is “equal” (E) to the last one. So, melodic contour can 
be represented as a string from the alphabet {U, D, E}, leading to a very small alpha- 
bet which provides less musical information than previous representations. 

Relative to Tonal Centre. This coding scheme arose along with our experiments in 
AC [5] in order to correct the relative representation ‘peculiar’ effect mentioned be- 
fore. Pitch is coded as the distance to the tonal centre or tonic in semitones. It in- 
cludes more musical information than the others, as it allows characterizing relation- 
ships between pitches and tonality. 



3.2 Duration Representation 

In terms of the rhythmic component of musical strings, almost the same representa- 
tions as pitch ones can be applied. It should be noted, however, that the problems that 
arise with pitch representations (highlighted in the previous section) apply also for du- 
ration representations. 

Absolute. It is a direct translation of the representation used in musical scores (e.g. 
whole note, half note, quarter note, and so on). It is the most commonly applied in 
musical string processing algorithms [2] and, to this end, is the only one we have 
tested. 

Relative. It is well known that listeners usually remember a rhythmic pattern as a 
relative sequence of durations that is independent of an absolute tempo. So, with ab- 
solute duration encoding, the same rhythmic pattern written with two different metrics 
will be considered as two different patterns. Representing rhythm as duration ratios 
can overcome augmentations or diminutions of a rhythmic pattern (Eig. 1). 
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Rhythmic Contour. Like melodic contour, durations can be coded in terms of con- 
tour strings. Instead of taking the absolute or relative duration, it is coded if the next 
duration is “shorter” (S), “longer” (L) or “equal” (E). 



J. J 

dur. 12 2 2 8 

ratios 1/6 1 4 



J. J 

6 114 

1/6 1 4 



Fig. 1. Two rhythmic patterns that match at the level of duration ratios are in fact the same 



3.3 Musical String Representation 

Notes can be coded as one symbol (what we call not-splitted encodings), or with two 
symbols (what we call splitted encodings). In the beginning of our study, only not- 
splitted encodings where used, but we realized that these coding schemes could be 
establishing a relationship between the pitch and length stronger than the one gener- 
ally existing in music. Therefore, splitted encodings were introduced as a modifica- 
tion from previous ones, just separating with a space character the pitch and duration 
symbols (Fig. 2), obtaining better results in MSI (see section 5). 

Combining pitch and duration representations exposed in the previous subsections, 
12 not-splitted encodings can be defined. Splitting them, other 12 arise and, if pitches 
and durations are used stand-alone, 7 more emerge. As a result, we have 3 1 different 
coding schemes that can be tested. For naming conventions, the format “pitch repre- 
sentation name - duration representation name” is used (Fig. 2). The “splitted” term is 
not used with not-splitted encodings. To this end, 13 coding schemes have been tested 
and results are presented in next section. In order to obtain the musical string, for 
these coding schemes, we have used numbers for pitch representations (except for 
contour representation) and letters for duration representation (only absolute repre- 
sentation has been tested). Thus, musical strings are easier to understand, as can be 
seen in Fig. 2. 





» « « J 7 


' - • 






m 












Ab s olute -Ab solute : 

22c 26c 24c 24c 24n Oc 24c 27c 24c 26c 27c 24c 22c 26c 22c 24c 26c 24n 
Relative- Absolute : 

Sc 4c -2c Oc On 99c Oc 3c -3c 2c Ic -3c -2c 4c -4c 2c 2c -2n 
Splitted Contour-Absolute: 

ScUcDcEcEnRcEcUcDcUcUcDcDcUcDcUcUcDn 



Fig. 2. A Gregorian style score coded with different representations 
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4 Experiments 

We chose 3 occidental musical styles from different epochs and we took 100 sample 
melodies from each one. These samples were 10 to 40 seconds long. The first style 
was Gregorian Chant. As a second style, we used passages from the sacred music of 
J. S. Bach. The third style consisted of passages from Scott Joplin's Ragtimes. Ex- 
periments in Style Identification were performed using ECGI, k-TSI, N-Grams and 
other GI techniques (mentioned in section 3). Three automata (one per style) are in- 
ferred with each GI technique, trying different values of k (with k-TSI) and N (with 
N-Grams). Test melodies are analyzed to see which of the learned automaton can 
generate them with the greatest probability. Given the small size of the available cor- 
pus, 10-fold Cross-Validation was used to measure the identification accuracy of the 
different techniques and coding schemes. Average Classifying Error in identification 
for each style was obtained and the best results are presented here. 



4.1 Results 

Eor the sake of conciseness, results will be summarized in Table 1, which shows the 
best Average Classifying Error for each GI technique with some of the tested encod- 
ings. Due to N-Gram’s results, the stand-alone and tonal centre encodings have only 
been tested with them. Results are worst than the obtained with the other coding 
schemes, being the best one a 6.66 % classifying error when using tonal centre pitch 
representation. 

Table 1. Classifying Error in Musical Style Identification experiments with the different GI 
techniques and coding schemes employed 





ECGI 


K-TSI 


N-GRAM 


Absolute-Absolute 


34.33 % 


10% 


4.66 % 


Relative-Absolute 


13.66% 


8.66 % 


4.33 % 


Contour-Absolute 


8.66 % 


7 % 


5.33 % 


Tonal Centre-Absolute 






5.33 % 


Splitted Absolute-Absolute 


9.66 % 


5.66 % 


3.33 % 


Splitted Relative-Absolute 


7.66 % 


5 % 


3 % 


Splitted Contour-Absolute 


10% 


5.66 % 


5.33 % 


Splitted Tonal Centre-Absolute 






4.66 % 



4.2 Discussion 

Analyzing these results by GI techniques, the best of them is clearly the N-Gram 
technique. A 3% error in Style Identification is obtained, comparing with a 5% using 
k-TSI and a 7.66 % using ECGI techniques. Although comparisons with the success 
rates of other style identification models is not very meaningful unless the same da- 
tasets are used, if we look to other similar studies [11] [12] [17], these average rates 
of success can be considered as quite good. It is worth noting that GI techniques tend 
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to need a larger quantity of training samples to get good results. So, it is expected that 
our results will be improved if the amount of training samples is increased. The ad- 
vantage of the N-Gram technique, over k-TSI and ECGI, relays on using the Back-qff 
smoothing procedure within the analysis algorithm [9]. It is well known in the Pattern 
Recognition community that N-Gram inference with back-off smoothing outperforms 
many other Syntactic Pattern Recognition techniques and, in our study, this has been a 
fact. 

Although the GI technique used for modeling musical style is very important, as 
seen before, our study has shown that the musical string representation is determinant 
in the results. It is clear that with the relative pitch representation are obtained the best 
results, consequently, it is expected that relative duration representation will be a 
good option to try in the future, specially splitted relative-relative encoding. The 
stand-alone encodings results are bad, showing the necessity of more musical infor- 
mation within the coding scheme. Thus, we do not consider the future use of the re- 
mainder {relative duration and contour duration). Although with tonal centre repre- 
sentation, Automatic Composition results were improved, in MSI it has been very 
different. It can be due to this approach does not accounts for transpositions of a pat- 
tern within the same piece. An important fact is that contour pitch encodings, spe- 
cially the splitted one, have not achieved as good results as the others. It is due to the 
small size of the alphabet (symbols) for these coding schemes (e.g. only 9 symbols for 
Gregorian style in the splitted encoding). 

Another conclusion of the study is that splitted encodings are clearly better than 
joined pitch and duration representations. As a result of this discussion, for future 
studies, we can discard contour and stand-alone representations, remaining 6 combi- 
nations with relative duration representations to be tested. Of them, the splitted en- 
codings are expected to be the best. 



5 Conclusions and Future Works 

A Musical Style Identification (MSI) model based on Grammatical Inference is pre- 
sented. Different coding schemes for music have been proposed and compared ac- 
cording to their suitability for working with Syntactic Pattern Recognition techniques 
and the results obtained in our experiments. Result from this work shows the need of 
proper music coding schemes, being the most important the use of two separated 
symbols (pitch and duration) for the encoding of each musical note. The best results 
in MSI have been obtained with the N-Gram technique, achieving a 3 % classifying 
error. Several lines of study can be followed to attempt improving results. First, the 
amount of data used so far is insufficient and better performance is expected by in- 
creasing the number of training samples. From the coding schemes presented in this 
work there are still 6 to be tested. Of course, other coding schemes must be explored, 
as trees or strings labelled with information about modulations or harmony. Once 
these tasks are dealt with, we could employ entire musical pieces as samples, and not 
just small fragments as was done in this study. 
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Abstract. Automatic marble classification based on their visual appearance is an 
important industrial issue, but due to the presence of randomly distributed high 
number of different colors and its subjective evaluation by human experts, the 
problem remains unsolved. In this paper, several new measures based on similar- 
ity tables built by human experts are introduced. They are used to improve the 
behavior of some clustering algorithms and to quantitatively characterize the re- 
sults, increasing the correspondence of the measures to the visual appearance of 
the results. The obtained results show the effectiveness of the proposed methods. 



1 Introduction 

Ornamental stones are quantitatively characterized by properties such as geological- 
petrographical and mineralogical composition, or mechanical strength. However, the 
properties of such products differ not only in terms of type, but also in terms of origin, and 
their variability can also be significant within the same deposit or quarry. Though useful, 
these methods do not fully solve the problem of classifying a product whose end-use 
makes appearance so critically important. Appearance is conditioned not only by the kind 
of stone, but it also depends on the subjective evaluation of “beauty”. Traditionally, the 
selection process is based on human visual inspection, given a subjective characterization 
of the materials’ appearance, instead of an objective, reliable measurement of the visual 
properties, such as color, texture, shape and dimensions of their components. Yet, quality 
control is essential to keep marble industries competitive: shipments of finished products 
(e.g. slabs) must be of uniform quality, and the price demanded for a product, particularly 
a new one, must somehow be justified. Thus, it is very important to have a tool for the 
objective characterization of appearance. 

In this paper, we are concerned with marble classification. Several clustering tech- 
niques for this purpose are presented and discussed. In order to support these algorithms 
and to better emulate results obtained by human experts, several new distances and 
measures are proposed. Hence, we introduce a so-called Weighted Manhattan (WM) 
distance, whose weights are determined by a genetic algorithm, which optimizes a dis- 
tance between marbles based on a similarity table built by human experts. This distance 
is used to automatically cluster the marbles. The clustering techniques obtain the clusters 
based on a set of features. These features are derived by a quadtree based segmentation 
analysis of marble images, as previously presented and discussed in [6,7]. The cluster- 
ing results are evaluated by the proposed new error measures, which are motivated by 
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the fact that standard error measures do not satisfactory agree with the classification 
results obtained by visual inspection of human experts. The proposed measures are the 
similarity table and the similarity score, which are based on expert evaluations. 

This paper is organized as follows. In Section 2 the distance measures are presented, 
and the weighted Manhattan distance, the similarity error and the similarity score, which 
are all proposed in this paper, are described. Section 3 presents a short description of the 
clustering algorithms used in this paper, namely, simulated annealing, fuzzy c-means, 
neural networks, and Takagi-Sugeno fuzzy models optimized by genetic algorithms. 
The application to real marbles data is discussed in Section 4. This section presents the 
results demonstrating the validity of the approaches introduced in this paper. Section 5 
concludes the paper, giving guidelines for futures developments in this important area. 



2 Distance Measures 

The evaluation of the obtained results can be a significant problem in situations where 
the visual appearance dictates the qualitative results. Sometimes results are visually very 
good but this fact is not revealed using standard error measures. On the other hand, a team 
of experts can built a so-called similarity table where, for each marble, those that could 
be considered very similar in appearance, from the available collection, are indicated. In 
the application in this paper, this number varies from 2 up to 10 marbles with an average 
of 4. The marbles considered to bo similar to a given marble are called neighbors. The 
table containing the marbles that are similar to a given example is called the complete 
similarity (CS) table. From this table another was derived, with a maximum number of 
4 neighbors, and where a weight uj reflecting the number of votes received from the 
experts was added to each neighbor. This table is named weighted similarity (WS) table. 



2.1 Weighted Manhattan Distance 

The weighted similarity table can be used to develop a new distance measure, the so- 
called Weighted Manhattan distance, which can find correct neighbors for new marbles 
in a simple way. The WM distance between the points Xm and Xn is given by: 

N 

'y ^ , ( 1 ) 

i=l 

where i = 1, . . . , TV are the coordinates of the space (features) with dimension N. The 
weights ai and are determined by optimizing the cost function 

^max 

= EE (2) 

i—1 17—1 

where k is the iteration of an optimization algorithm, is the total number of data 
points, rimax is the maximum number of neighbors defined for each data point, is 
the V neighbor of data point i, n'^ is the v neighbor defined at step k, and ujiy is the 
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weight, defined by the experts, associated to the v neighbor of data point i. The function 
S is simply given by 






0, if riiy ^ 

1, otherwise 



(3) 



Note that in this paper the data points are marbles, Nd is equal to 112, and nmax was 
settled to 4. A standard genetic algorithm based on real code [5] was used to optimize the 
weights ai and j3i in (1). Due to lack of space, we do not present the genetic algorithm 
parameterization in this paper. 



2.2 Similarity Error and Similarity Score 

This paper intends to cluster a given collection of marbles from a set of features, in order 
to be able to classify these marbles. Besides the visual appearance of the results, which 
is in general very good, it is important to evaluate quantitatively the classification results 
in order to compare different algorithms. However, the most common error measures 
based on the differences between the original marble classification and the attributed 
one are not consistent with the visual quality of the clusters. This is mainly due to the 
difficulty of experts in attributing a certain class to a given marble. Most marbles could 
be classified in a different, although similar, way. Based on the importance of the visual 
appearance, the CS table can be used to evaluate the clustering algorithms. Therefore, 
this paper introduces two new measures, which are called similarity score and similarity 
error, respectively. 

Let Vxi be the set of neighbors of the marble Xi and Cj be the cluster j with rij 
elements. A score for the cluster Cj can be defined as 

"j 

(4) 

2=1 



The global score of a given clustering algorithm for the total number of clusters ric is 
given by: 

Tic 

SC = SCj (5) 

i=l 

On the other hand, a marble is wrongly integrated in a cluster if there are no marbles 
of its neighborhood in that cluster. So a similarity error measure can be defined as 
follows: 

Uj 

SE, = n Cj) , (6) 

2=1 



where 



f{v., n Cj) 



1, if n Cj = 0 

0, otherwise 



(7) 



And the global error of a given clustering algorithm for the total number of clusters 
is given by: 

Uc 

se = Y,se, 

2=1 



( 8 ) 
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This measure can be normalized in order to facilitate the comparisons. The normalization 
is given by: 



se = 






(9) 



The similarity score SC and the similarity error SE are related but not dual, as it will 
be clear in the results presented in Section 4.2. 



3 Clustering Algorithms 

This paper uses both supervised and unsupervised clustering methods. The unsupervised 
methods do not use the training set classification to derive the proper clusters. This type 
of algorithms has a cost function that must be optimized. The classification of a given 
data point (marble) is achieved through the distance of its features to the respective 
clusters. This paper uses two types of unsupervised clustering algorithms: simulated 
annealing and fuzzy c-means. In the supervised methods, the classification given in the 
training set is used to derive the clusters. Thus, the classification is achieved through 
the simulation of the derived model, where the inputs are the marbles to be classified. 
The supervised methods applied in this paper are neural networks, and fuzzy models 
optimized by a genetic algorithm. 

Let {xi, . . . , xat^} be a set of Nd data objects where each object is an instance 
represented by the vector x^ G K^, which is described by a set of N features. The 
set of data objects can then be represented as a Nd x N data matrix X. The clustering 
algorithms used in this paper determine a partition of X into C clusters. 

3.1 Simulated Annealing 

This optimization algorithm simulates the annealing process, by reproducing the crys- 
talization process of particles. These particles move during the solidification process, 
which occurs when the temperature decrease. In the final stage, the system reaches the 
minimal energy configuration [8]. The process starts at an initial temperature, which 
should consider the number of changes that increases the system energy, in order to 
allow the system to escape from local minima. This algorithm defines the proportion for 
acceptable changes that increases the system’s energy. The cost function is represented 
by the Euclidean, Manhattan, or weighted Manhattan distances between the data points 
and the cluster centers. This cost function indicates if the system is converging or not 
to a lower energy state. The number of iterations to accomplish at each temperature is 
proportional to the number of elements. This parameter is proportional to the dimension 
of the data set Nd - The temperature T must decrease between the several energy levels. 
For consecutive levels, the temperature decrement is given by 

= d • (10) 

The stop criterion can be given by a predetermined number of temperature decreases or 
when after some consecutive temperature decreases the final cost does not change. 
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3.2 Fuzzy C-Means 

Considering the set of data objects X, the fuzzy c-means algorithm determine a fuzzy 
partition of X into C clusters by computing a Nd x C partition matrix U and the C-tuple 
of corresponding cluster prototypes [2]: 

V= [vi,...,vc]. (11) 

Most often, the cluster prototypes are points in the cluster space, i.e. G . The 
elements Uik € [0, 1] of U represent the membership of data object in cluster i. This 
paper uses the standard fuzzy c-means algorithm, as described in [2,1]. Note however, 
that two distance measures are used; the Euclidean and the WM distances. 

3.3 Neural Networks 

Neural networks have been largely used as input-output mapping for different applica- 
tions including modeling, control and classification [4]. The main characteristics of a 
neural network are its parallel distributed structure and its ability to learn, which produces 
reasonable outputs for inputs not encountered during training. Moreover, the structure 
can be chosen to be simple enough to compute the output(s) from the given input(s) in 
very low computational time. 

The neural network used in this paper has hidden-layers with hyperbolic tangent 
activation functions and a linear output layer. The network must have few neurons in the 
hidden-layers, and is trained using the resilient backpropagation algorithm [3]. 

3.4 Fuzzy Models 

Fuzzy models have gained in popularity in various fields such as control engineering, 
decision making, classification and data mining [10]. One of the important advantages 
of fuzzy models is that they combine numerical accuracy with transparency in the form 
of rules. Hence, fuzzy models take an intermediate place between numerical and sym- 
bolic models. A method that has been extensively used for obtaining fuzzy models is 
fuzzy clustering. Several fuzzy modeling approaches to classification based on fuzzy 
clustering, have been compared in [11]. From the ones compared the optimization of 
Takagi-Sugeno fuzzy models using a genetic algorithm (GA) as described in [9] proved 
to be the best, and as so, this algorithm is applied in this paper. 

4 Application: Marble Classification 

4.1 Parameters of the Algorithms 

The clustering algorithms are tested for two situations: clustering veins for the color 
with more elements, where the training set has 29 marbles and the test set 14 marbles, 
and clustering colors, where the training set has 69 marbles and the test set 43 marbles. 

First, simulated annealing and fuzzy c-means was applied using the Euclidian dis- 
tance and the weighted similarity distance. These classifiers used 6 and 3 clusters to 
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classify marbles color and marbles veins, respectively. In the SA algorithm, the accept- 
able change used to compute the initial temperature is 0.3, the constant used to compute 
the number of iterations to accomplish in each temperature is set to 5, the parameter d 
in (10) is equal to 0.9; and finally 100 temperature decreases are allowed. 

In the feedforward neural networks, the parameters are derived experimentally in 
order to obtain good classification results. In this paper, three hidden layers with 9, 12 
and 9 neurons showed to derive the best classification results. The number of epochs is 
set to 100, in order to avoid overfitting. 

The Takagi-Sugeno fuzzy models returns as output a real number, which must be 
transformed to a given class Cfe. As so, the output of the model y is set to a class as 
follows: 

Ck = roimd(i/) . (12) 

The classification value Cfc corresponds to the set Ck £ {1, 2, 3} for the case of the veins, 
and to the set Ck £ {1, 2, 3, 4, 5, 6} when the colors are classified. 

4.2 Results 

Table 1 presents the comparison of the results obtained for vein classification. The error 
measures are the the following: e - mean classification error, SC- global score of clusters 
as defined in (5) and se - normalized similarity error, as defined in (9). The clustering 
algorithms in Table 1 are the simulated annealing (SA), the simulated annealing using the 
WM distance (SAw), the fuzzy c-means (FC), the fuzzy c-means using the WM distance 
(FCw), neural networks (NN) and the fuzzy model optimized by genetic algorithms 
(FGA). Table 2 presents the comparison of the results obtained for color classification. 
The notation used in Table 1 stands also for Table 2. 



Table 1. Comparison of the results obtained for vein clustering. 





e 

train data test data 


se 

train data test data 


SC 

train data test data 


SA 


0.41 


0.42 


0.00 


0.14 


90 


23 


SAw 


0.38 


0.50 


0.03 


0.00 


112 


23 


FC 


0.41 


0.36 


0.03 


0.29 


89 


21 


FCw 


0.38 


0.50 


0.03 


0.00 


112 


23 


NN 


0.10 


0.21 


0.03 


0.14 


130 


22 


FGA 


0.10 


0.50 


0.03 


0.07 


130 


19 



In both tables it is clear that the results of the unsupervised methods are not so good as 
the ones using the supervised methods, especially when evaluated with traditional mea- 
sures. However, the measures se and SC proposed in this paper allow to conclude that 
in general the WM distance improved the performance of the unsupervised algorithms. 
Thus, the apparent big differences between supervised and unsupervised algorithms are 
less evident with the introduced measures SC and se. These results are confirmed by 
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Table 2. Comparison of the results obtained for color clustering. 





e 

train data test data 


se 

train data test data 


SC 

train data test data 


SA 


0.48 


0.67 


0.06 


0.14 


195 


93 


SAw 


0.46 


0.53 


0.03 


0.16 


240 


87 


FC 


0.45 


0.63 


0.06 


0.12 


197 


69 


FCw 


0.48 


0.60 


0.06 


0.14 


234 


92 


NN 


0.13 


0.26 


0.09 


0.07 


254 


81 


FGA 


0.10 


0.44 


0.01 


0.16 


268 


52 



visualizing the marbles. Due to the lack of space, we present only the visual results of 
one clustered region in the vein clustering presented in Table 2. The marble classification 
is presented in Figure 1 . 
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(f) GA optimized fuzzy model. 



Fig. 1. Example of marbles veins clusters: results for one region in the training set. 



5 Conclusions 



This paper deals with the problem of clustering natural surfaces based on their visual 
appearance (marble classification in this case). Standard classification error measures do 
not satisfactory correspond to the classification results obtained by visual inspection of 
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human experts. Therefore, new measures are proposed in this paper, namely a similarity 
table and a similarity score, which are both based on expert evaluations. Further, we also 
introduced the so-called weighted Manhattan distance, which is based on a weighted 
expert table, and tuned through the use of genetic algorithms. This distance was applied 
to simulated annealing and fuzzy c-means clustering, and in general it improved the 
clustering results. Two supervised techniques were also tested, neural networks and a 
fuzzy model optimized by genetic algorithms. All the tested algorithms have shown that 
the set of features used to describe the marbles are quite adequate for clustering based 
on visual appearance. 

Future work includes further studies in the weighted Manhattan distance, further 
evaluation and possible improvements of the introduced new measures, and finally a 
possible combination of clustering techniques to obtain a reliable marble classification 
system. 
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Abstract. Most people are used to signing documents and because of this, it is a 
trusted and natural method for user identity verification, reducing the cost of 
password maintenance and decreasing the risk of eBusiness fraud. In the pro- 
posed system, identity is securely verified and an authentic electronic signature 
is created using biometric dynamic signature verification. Shape, speed, stroke 
order, off-tablet motion, pen pressure and timing information are captured and 
analyzed during the real-time act of signing the handwritten signature. The 
captured values are unique to an individual and virtually impossible to duplicate. 
This paper presents a research of various HMM based techniques for signature 
verification. Different topologies are compared in order to obtain an optimized 
high performance signature verification system and signal normalization pre- 
processing makes the system robust with respect to writer variability. 



1 Introduction 

Day by day, natural and secure access to interconnected systems is becoming more 
and more important. It is also necessary verifying people identity in a fast, easy to use 
and user-friendly way. 

Traditionally, during the process of identification and controlling the access to sys- 
tems or applications, we used objects, e.g. keys or smart cards, or we used knowledge 
based systems like PINs, or passwords. However, objects may be lost and knowledge 
may be forgotten and both may be stolen or copied. 

Biometrics [1] relies on several personal and unique body features (e.g. finger- 
prints, iris or the retina) and individual behavior features (e.g. the way of speaking, 
writing, signing or walking). Those individual features, either physical or behavioral, 
allow identifying each individual univocally offering a solution for the conventional 
security problem. Because of this, biometric solutions are considered one of the most 
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trusted and natural ways of identifying a person and controlling access to systems and 
applications. 

Normally, most citizens are not confident on biometric identification systems based 
on body features like fingerprints, iris o retina, because they feel these systems to be 
related to criminals and police issues. However, those features related to our behavior 
are accepted even though they are much less precise. 

Research groups of four Spanish universities joined in the research project [2] 
called “ApUcacion de la Identificacion de Personas mediante Multimodalidad Biome- 
trica en Entornos de Seguridad y Acceso Natural a Servicios de Informacion” . The 
first result from this project was the creation of a Multimodal Biometric Database [3] 
(fingerprints, signatures and voice) which is the starting point for the rest of the re- 
search of each participating group. This paper is a result of the subsequent research on 
handwritten signature using that database. 

Section 2 is an introduction to signature verification, section 3 is dedicated to the 
description of the system, section 4 to the produced results and finally, in section 5 the 
conclusions are explained. 



2 Signature Verification 

Handwritten signature is commonly used and accepted as a way to verify people’s 
identity; we usually sign documents to verify their contents or to authenticate financial 
transactions. Signature verification usually consists just of an “eye inspection” as if we 
compared two photographs, but this is not an efficient method against impostors and 
many times there is no verification process at all. 

The automation of the verification process tries to improve the current situation and 
eliminate the eBusiness fraud. Automatic signature verification is divided into two 
main areas, depending on the way the data are acquired: In off-line signature verifica- 
tion, the signature is available in a handwritten document which is scanned to obtain 
the digital representation of the image. On the other hand, in on-line signature verifi- 
cation specific hardware is used (digitizing tablets) to register pen movements on the 
paper during the act of signing. 

Off-line verification is used with signatures from past documents, not acquired in a 
digital format, and only the shape of the signature remains important. However, in on- 
line verification, we also use dynamic information of the signature, such as pen pres- 
sure or inclination, apart from the 2D spatial representation. The presence of the indi- 
vidual at the time of the digital capture is also required. 
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3 System Description 

3.1 Online Signature Acquisition Module 

Our system uses a graphics tablet from Wacom as capturing device. More precisely it 
is the Intuos A6 model with USB interface. This tablet provides 100 samples per sec- 
ond containing values for pressure and the four degrees of freedom: X and Y coordi- 
nates, pen azimuth and inclination for every sample. 

Strokes with no pressure, also known as pen-ups, are also sampled, and because of 
this the system is able to know the trajectory with ink and inkless, which means that 
we have extra information, making the system more robust. 

The signature information, once digitized, is stored in a file as a matrix, and after- 
wards it may be used to create a new input in the database or as a test signature for the 
verification process. 







9. 4S9 



Fig. 1. The digitized signature consists of a sequence of sample points along the signature, 
captured with a frequency fixed by the acquisition device. Its length is directly proportional to 
the time of signing, in this example 9.4s. Pen-up symbols occur during the time in black. 



3.2 Online Signature Database 

The system uses a database, in which each individual has 25 true signatures. At the 
same time, each individual makes 5 forgeries of every of his/her 5 immediately previ- 
ous entries in the database. This means that for every individual we have 25 true sig- 
natures and 25 forgeries made by 5 different people. 

Data from 150 individuals were used in the research presented in this paper, i.e. 
3.750 files of true signatures and the same number of forgeries. 

The forgeries in the database are skilled forgeries, as the impostor tries several 
times to imitate the true user’s signature before the forgery is acquired and finally 
stored in the database. Trying to improve the quality of the forgeries we encouraged 
the participants to do their best offering them a prize. 
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3.3 Signature Preprocessing 

Every time we sign, we do it in a different way. Because of this, some factors like 
speed variation, different sizes or rotations or different places within the tablet have to 
be taken into account in order to get a representation of the signature independent 
from these factors. 

The preprocessing module makes a time normalization so that the resulting signa- 
tures have the same length or number of samples. To do this, a process of interpolation 
or extrapolation is done depending on the number of samples of the original signature. 
This normalization is user-defined as the user can decide to keep the original size. 



hiierpolaieJ ! e.\lra- 




Fig. 2. Time normalization algorithm. In the example, n = 1.4 represents the ratio between the 
original size of the signature (420 samples) and the normalized one (300 points). The acquired 
points are represented as (x ,y.), with j from 0 to 420, and the normalized points are (xV,yV), with 
i from 0 to 300. 



Coordinates after normalization are calculated following this algorithm; 

(x’,y’) = (a*Xj-ffo*Xj+i,a*yj-fh*yj+i) 



( 1 ) 



with: 



j = floor (i * n) 
b = (i * n) - j 



a = l-b 



Yang, Widjaja and Pradsad’s method [4] consists of an algorithm that eliminates 
size variability (X-Y coordinates) and rotations with respect to the tablet. These 
authors use the absolute value of the angle corresponding to the segment that ties two 
consecutive normalized points, using the formula below; 
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0(k) = arctan 



2_^s, sin0, 



I — /+1 



2_^s, COS0 , 
l=i+l 



( 2 ) 



{k) (it) (A:) 

with 0\ =0, “ where 0^ is the absolute angle of the first segment and Sj is 

the length of the segment between two consecutive points. This formula normalizes 
the signature and subtracts the absolute value of the first segment at the same time. 

To improve the computational efficiency of this algorithm we propose some modi- 
fications to the Yang’s original formula, adapting it to the algorithm represented in the 

(k) (k) 

figure 2. Developing sin( 0, -di) and cos( 6^ -0\) trigonometric expressions and 

as Ay,® sin6}**^ and Ac,® cos6}**',Yang, Widjaja and Prasad’s formula (2) 
takes this new appearance: 



<l>{k) = arctan 
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(3) 



Although this formula seems much more complex, it is more efficient, as cos6[ 
and sin6[ are only computed once, because they are constant values for all the sam- 
ples along the signature. Besides, if we apply this algorithm to the normalized length 
the final result is as follows: 



(p{i) = arctan 



(yW-y’, )cos6>i 
(x’,.+i-x’,.)cos6»i -l-(y’,.+i-y’,. 




(4) 



3.4 Model Training 

Training a verification system consists of generating a model of the item that we want 
to verify using a set of observations of it. Models are initialized using the first 5 origi- 
nal signatures of each signer and reestimated using 4 more signatures. Afterwards they 
are stored in a database. 

The set of observations used to generate the model must show the natural variation 
of the user’s signature and the efficiency of these systems depends strongly on how 
representative these observations are of the user’s signature during the creation of the 
database. 
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3.5 Verification and Threshold Selection 

To verify whether the signer is a true user or an impostor we calculate the similarity 
between its signature and the trained model. Then we compare this value to the 
threshold selected to determine if we accept the signature or we reject it. 



Model 

deiinition 



Configuration 
file ♦ 



▼ ♦ 




[on 

USER'S 

VIODF.I. 



Fig. 3. Initialization and reestimation process of user’s model. 



Defining this threshold, we have to take into account the security level we need for 
our application, this is, if we need a low FAR {False Acceptance Rate) or a low FRR 
{False Reject Rate) as reducing one of this values means increasing the other. Nor- 
mally, a security system should guarantee a FAR close to zero, but this means a higher 
FRR, because they are inversely proportional. 

To check how accurate this algorithm is, we studied the DET plots {Detection Er- 
ror Tradeoff) for all the users, defining the minimum cost point as follows; 

DCF=Cmiss*Pmiss| True*Ptrue + Cfa*Pfa | False* Pfalse (5) 

where Cmiss and Pmiss are respectively the cost and probability of a false reject, Cfa 
and Pfa the cost and probability of a false acceptance, Ptrue is the a priori probability 
of the target and Pfalse is 1 -Ptrue. This function will be evaluated for every point 
along the DET plot, finding the point where the function takes the minimum value. 
This point defines the threshold for which the accuracy of the algorithm is optimum. 
Another reference point is the EER point {Equal Error Rate) where FAR and FRR are 
the same. 
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These plots have been calculated using the free software of NIST [5]. These pro- 
grams dynamically change the reject threshold and calculate the FAR and the FRR for 
different situations. The more collaborative the user is the lower FRR weTl get. 
His/her DET plot will be closer to the axis and the EER will be lower too. 



4 Developed Models 

In our first models, signatures were described using the directional normalized angle 
along the trajectory of the signature (equation (4)). An important part of this study was 
the definition of the number of states, the number of symbols, the transition matrix, 
and the initial probability of the distribution of the states, i.e. the topology of the mod- 
els. 

We made some tests to determine which topology of the HMMs [6] showed the 
best efficiency, these tests were made with signatures normalized to 300 samples and 
quantified with 32 symbols. We verified that 6-state L-R {left-right) models were 
more efficient than other L-R models. The worst results were obtained using ergodic 
or generalized models, those in which transitions between all the states are allowed. 

Having defined the architecture of our models, we tested the application’s accuracy 
for different normalized lengths. We normalized all the signatures to 100, 200, 300, 
400, 500, 600 samples and also we used non-normalized signatures (keeping their 
original size) and found that the algorithm was more accurate using values between 
300 and 500. These values are clearly related to the average signatures duration of 
about 3 seconds. 




4.1 Verification Results 

We studied the performance of the system working in verification mode, i.e. validat- 
ing a person’s identity comparing the captured signature with the individual’s tem- 
plate, which is stored in the system database. Table 1 shows the initial results. 
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Table 1. Initial results using Yang, Widjaja and Prasad’s preprocessing, which only includes 
angles. Although the FAR is close to 0, the false reject rate is nearly 50%, which means that an 
average user should sign twice to gain access to the system. 





FAR 


FRR 


EER 


Mean value 


0,295% 


48,21% 


17,565% 



To set the optimum working point, we calculated the minimum cost point with 
NIST functions, favoring a very low FAR weighted 10 to 1, at expense of a high FRR. 
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Fig. 5. DET (Detection Error Tradeoff) plot for user No.208, with FAR and FRR as X and Y 
axis. EER is quite similar to the mean value shown in table 1 and ERR is slightly better. 

In the next tests we eliminated the first angle subtraction proposed by Yang, Wid- 
jaja and Prasad because we believe that the database creation methodology (users 
signed inside a grid) made it unnecessary. The new results showed that the ERR was 
halved by eliminating this subtraction, implying that it introduced a noise harmful to 
the verification process. Finally, our system was trained including pressure, azimuth 
and inclination. 

Table 2. Results with the new preprocessing method including angles, pressure, azimuth and 
inclination 





EAR 


ERR 


EER 


Mean value 


0,00% 


31,52% 


9,253% 
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Introducing these additional parameters results in a remarkable improvement of the 
algorithm efficiency. 



5 Conclusions 

The first angle subtraction proposed by Yang, Widjaja and Prasad is unnecessary in 
our system because users sign all inside a grid and it introduces a noise harmful to the 
verification process. For a system in which users are not asked to sign inside a grid, 
we propose to subtract the angle of the principal axis of inertia of the signature, as it is 
a more stable value than the first angle. 

Adding additional parameters such as speed, acceleration, mass center, inertia axis, 
linear and circular segments length [7], curvature radii, etc. would result in a large 
EER improvement of the system, satisfying commercial requirements. 

Multimodal fusion of several biometric methods (fingerprints, voice, signature, 
etc.) is another way to improve the efficiency of the verification. In the same way, we 
could talk about intramodal fusion, combining several verification methods based on 
the same biometric feature. Eusion of on-line and off-line signature methods can make 
the system more robust and efficient. 
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Abstract. The Continuous Distance Transformation (CDT) used in 
conjunction with a fc-NN classiher has been shown to provide good re- 
sults in the task of handwriting recognition [1]. Unfortunately, efficient 
techniques such as fcd-tree search methods cannot be directly used in 
the case of certain dissimilarity measures like the CDT-based distance 
functions. In order to avoid exhaustive search, a simple methodology 
which combines kd-tiees for fast search and Continuous Distance Trans- 
formation for fine classification, is presented. The experimental results 
obtained show that the recognition rates achieved have no signihcant dif- 
ferences with those found using an exhaustive CDT-based classihcation, 
with a very important temporal cost reduction. 



1 Introduction 

Statistical non-parametric methods, such as A:-nearest neighbor classifiers (fc- 
NN) provide very good results on many pattern recognition tasks (e.g.[ll, 4]). 
One of the basic requirements for these methods to obtain good performances, 
however, is the use of a very large database of labeled prototypes. In some tasks, 
like handwritten character recognition, collecting a large number of examples is 
not as hard as in other applications, but searching through the whole database 
to find the nearest objects to a test image is time-consuming, and has to be done 
for every character in a document. This has been a recurring argument against 
the use of fc-NN classifiers for this task, since a character recognizer is supposed 
to carry out many classifications per second on a moderately powerful machine 
to be useful and competitive. On other hand, the use of complex distances, 
aggravates the problem of the speed in the classification phase, where a distance 
computation for each test character to every prototype in the training set must 
be done. 

In order to avoid searching over the whole training set, kd-tree data struc- 
tures can be used. A kd-tree is a binary tree which allows fast and approximate 
search in large databases, providing results very similar to exhaustive search. 
Unfortunately, only certain metrics, including L-norms, can be naturally used 
as a distance function by traditional kd-tree search algorithms. 

In this work, the problem of the computational complexity reduction associ- 
ated to a fc-NN classifier using complex distance functions has been approached 
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in a simple way: find a number k' of nearest neighbors using a fast approximate 
kd-tree search in a first feature space and, then, use a more complex or quali- 
fied dissimilarity measure applying exhaustive k nearest neighbors search to the 
subset of k' neighbors selected by the kd-tree. This results in a huge reduction 
on the the number of heavy computations needed to be performed. CDT-based 
features and distance functions have been selected in this paper because of the 
good results reported in previous works [1] for the particular task of handwriting 
recognition. 



2 The Continuous Distance Transformation 

Obtaining feature maps from images, where the distance relationships among 
their pixels are taken into account is the goal of a well-known technique usually 
referred to as Distance Transformation or DT [10]. The Distance Transformation 
is traditionally defined as an operation that transforms a binary image consisting 
of feature and non-feature pixels into a distance map, where all non-feature pixels 
have a value corresponding to the distance (any suitable distance function on the 
plane) to the nearest feature pixel [6]. Unfortunately, binarization is a necessary 
step in order to compute the classical Distance Transforms from continuous- 
valued images, causing a loss of information. 

Recently, a generalization of the DT, the Continuous Distance Transforma- 
tion (CDT), has been presented as a technique to compute distance maps from 
continuous- valued images [2]. Applicable to gray-level images, the CDT tech- 
nique avoids binarization process and make use of the whole information content 
of the original range of representation. 

Taking the definition of Distance Transformation as a basis, an item {i,j) of 
a “Distance Map to the Nearest White Pixel” holds the distance from pixel {i,j) 
on the image to the nearest white pixel. Note that this value can be interpreted 
as the number of fringes expanded from {i,j) until the first fringe holding a 
white pixel is reached, where a “fringe” is defined as the set of pixels that are 
at the same distance of (i,j). 

A parallelism between a distance map of binary images and one whose pixel 
values are defined in the gray-scale domain [O..MaxBright] implies the replace- 
ment of the “white pixel” concept by the “maximum bright value” and actions 
as “find the nearest white pixel” by “accumulate a maximum bright value on 
an expanding neighborhood” . Moreover, the value of an item on the continuous 
distance map is a function of the pixel value itself, as well as, of the number of 
fringes expanded until an accumulated bright value reaches a threshold accord- 
ing to a certain criteria of bright value accumulation, which is applied to the 
pixels belonging to each fringe analyzed. Then, the concept of “distance to the 
nearest white pixel” is substituted by the concept of “distance from a pixel to 
the limit of their minimum area of brightness saturation” . 

Two types of CDT-based maps can be defined: Continuous Distance Map 
to Direct Scale Saturation or 0^ and. Continuous Distance Map to Reverse 
Scale Saturation or 0^ depending on if a maximum value of bright intensity or 
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a maximum value of reverse bright iuteusity is accumulated, respectively. Both 
maps provide distiuct iuformatiou about a poiut aud its surrouudiug area. lu 
[2], detailed descriptious of these coucepts are preseuted. Giveu au image, either 
a 0^ map or a 0^ map are more or less descriptive depeudiug ou its brightuess 
distributiou. The cost of a CDT map computatiou is iu Q{m? x n^) for au image 
of n X TO pixels, but, iu practice, it is much lower. 

Several distauce aud dissimilarity measures based ou the Coutiuuous Distauce 
Trausformatiou cau be used to take advantage of the full possibilities of the 
represeutatiou obtained. These measures are collected in three generic, families 
including several distance functions each [2] : 

— Continuous Fringe Distance measures including Fringe Distance using 0^ 
maps (FDD), Fringe Distance using 0^ maps (FDR) and Symmetrical 
Fringe Distance (SFD) which uses both 0^ and 0^ maps. 

— Continuous Pixel Distance measures or PDLp, which use the L-norm metric 
in its computation along with 0^ and 0^ maps^. 

— L-norm between CDT maps. Three sub-families can be computed depending 
on the CDT maps taken: the LpD metrics if the L-norm of 0^ maps is 
computed; the LpR metrics if the 0^ maps are used instead; and the LpDR 
metrics if both maps are employed. 



3 Fast Approximate Search of k Nearest Neighbors Using 
fed- Trees 

The nearest neighbor search problem can be formulated in several distinct do- 
mains: from Euclidean vector spaces to (pseudo) metric spaces. Most algorithms 
intended for vector spaces are directly based on the construction of a data struc- 
ture known as kd-tree [7, 5]. 

A kd-tree is a binary tree where each node represents a region in a k- 
dimensional space. Each internal node also contains a hyper-plane (a linear sub- 
space of dimension k — 1) dividing the region into two disjoint sub-regions, each 
inherited by one of its children. Most of the trees used in the context of our 
problem divide the regions according to the points that lay in them. 

In many cases, an absolute guarantee of finding the real nearest neighbor of 
the test point is not necessary. In this sense, a number of algorithms of approx- 
imate nearest neighbor search have been proposed [3, 8]. 

In a kd-tree, the search of the nearest neighbor of a test point is performed 
starting from the root, which represents the whole space, and choosing at each 
node the sub-tree that represents the region of the space containing the test 
point. When a leaf is reached, an exhaustive search of the b prototypes residing 
in the associated region is performed. Unfortunately, the process is not complete 
at this point, since it is possible that among the regions defined by the initial 
partition, the one containing the test point be not the one containing the nearest 
prototype. It is easy to determine if this can happen in a given configuration. 
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in which case the algorithm backtracks as many times as necessary until all the 
regions that can hold a prototype nearer to the test point have been checked. 

If a guaranteed exact solution is not needed, the backtracking process can be 
aborted as soon as a certain criterion is met by the current best solution. In [3], 
the concept of (I + e)-approximate nearest neighbor query is introduced, along 
with a new data structure, the BBD-tree. A point p is a (1 + e)-approximate 
nearest neighbor of q if the distance from p to g is less than 1 + e times the 
distance from p to its nearest neighbor. 

4 Combining fed- Trees and the Continuous Distance 
Transformation 

A simple methodology combining kd-trees and Continuous Distance Transfor- 
mation for handwriting recognition is proposed. In a first step, fast search using 
kd-trees is applied to a test observation in order to get a number k' of nearest 
prototypes. Secondly, an exhaustive search of the k nearest neighbors, k < k', 
among the k' pre-selected prototypes using specific features and distance func- 
tions is carried out to assess better performances. 

For handwritten character classification we tested CDT-based distance func- 
tions due to its good results reported in previous works [1]. The computational 
cost of this combined methodology is significantly lower than that of exhaustive 
search over the whole training set. The computational cost of the method will 
depend on k' , the number of neighbors pre-selected, because it will determine 
the number of computations of the second step. 

5 Experiments 

The main goal of the experiments was to show if the proposed technique per- 
forms similarly to exhaustive search method in the task of handwritten character 
recognition, and quantify the computation time improvements obtained. 

In this context, several configurations of the classification parameters can be 
identified. In order to analyze the behavior of the system in each of those possible 
settings, three experimental phases were planned: a) Selection of the features to 
use in the kd— tree pre-classifier; b) Test of the performance of different CDT 
variations and c) test for different values of k and k' . 

5.1 Datasets, Preprocess, and Algorithms 

The well-known NIST Special Database 3 (SD3) contains a large number of 
isolated handwritten characters: lower-case, upper-case letters, and digits. In this 
work, the 44951 upper-case letters from the SD3 were chosen for the experiments. 
The characters are stored as 128 x 128 binary pixels images, segmented and 
labeled by hand. The database was split into two sets used for error estimation: 
the first 39941 upper-case letters for training, and the last 5009 for test. No 
writer appeared in both sets. 
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Table 1. Results of phase a). Error rates of the three methods using different feature 
maps, for fc'=100, fc=4, PDL 3 distance function, and e=1.5. 



Method \ Map 


Image 


0^ 


qR 


kd-tree 


6.05 


8.66 


4.99 


kd-tree & CDT 


3.87 


4.07 


3.85 


Exhaustive CDT 


3.85 



To obtain a usable representation of the images in a lower dimensional space, 
common resampling and normalizing procedures were applied. These techniques 
generate gray-level images that keep most of the original information. Thus, the 
character images were sub-sampled from 128 x 128 binary pixels into 28 x 28 gray 
value by first computing the minimum inclusion box of each character, keeping 
the original aspect ratio, and then accumulating into each of the 28 x 28 divisions 
the area occupied by black pixels to obtain a continuous value between 0 and 1. 

The Continuous Distance Transformation was applied to the subsampled 
images, obtaining the CDT maps. Some parameters that have influence in this 
process have been fixed to present comparative results: the maximum number 
of fringe expansions was set to 3; the fringe value function chosen was “the 
maximum pixel value on the fringe”; and, the Loo, or chessboard distance, was 
the metric on the plane used for computing the pixels belonging to each fringe^. 

In order to make use of the kd-tree, Principal Component Analysis (PCA) 
was performed on the representations to reduce its dimensionality to 40 [9]. The 
approximate nearest neighbor algorithm used was based on [3] . 



5.2 Phase a) 

In the first phase, the main goal was to compare performances using the fol- 
lowing three classification methodologies: 1 ) Approximate A:-NN search using 
kd-trees with fc=4; 2) Approximate k' nearest neighbors search using kd-trees 
with fc'=100 followed by exhaustive k nearest neighbors search with fc=4 using 
a CDT-based measure over the 100 pre-selected neighbors; and 3) Exhaustive k 
nearest neighbors using a CDT-based measure with fc=4 over the whole of the 
training set. In the kd-tree classifier, the image, and the 0^ and 0^ maps were 
tested as feature vectors. 

The approximation parameter e used for approximate search in kd-trees was 
1.5. It has been shown that this value reduces the computation time without 
affecting recognition rates in other related experimental contexts [9] . A common 
dissimilarity measure, PDL3, was chosen for tests which involve CDT evalua- 
tions. The results are shown in Table 1 as error rates at zero rejection. 

Notice the excellent error rate of 4.99% obtained from the kd-tree technique 
using the 0^ map as features, and the 3.85% error rate obtained from our 

^ Some tools and functions in ANSI-C related to the Continuous Distance Transfor- 
mation can be found at http://tafetan.disca.upv.es/sw/cdt 
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Table 2. Results of phase b). Error rates for CDT-based measures using the combined 
methodology for (if applied), fc'=100, fc=4, and e=1.5. The fed-tree features were the 
0 ^ map. 



Distance \ p 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


LpD 


9.42 


8.64 


8.48 


9.00 


9.32 
















LpR, 


5.77 


4.83 


4.51 


4.39 


4.29 


4.11 


4.19 


4.23 


4.31 


4.39 






LpDR 


6.19 


4.97 


4.39 


4.23 


4.07 


4.19 


4.13 


4.01 


4.05 


4.17 


4.15 


4.09 


FDD 












7. 


77 












FDR 












4. 


83 












SFD 












8. 


00 

0 












PDLp 


5.39 


4.13 


3.85 


3.81 


3.73 


3.75 


3.89 


3.85 


3.77 


3.91 







combined methodology. Both rates improve significantly the error rate obtained 
from fed-trees when it uses images map as features. 

5.3 Phase b) 

According to the results of the previous phase, the combined fed-tree and CDT 
methodology was chosen to be analyzed in more detail. Thus, in order to get 
the best CDT-based measures using our methodology in the task of handwrit- 
ing recognition, tests over all CDT families of measures were done. For those 
measures having a p potency (related to the L-norm of the CDT maps), a range 
of values of p between 1 and 12 were tested. For the Fringe Distance measure 
family, the three existing functions were used. The results presented in Table 2 
show that the Pixel Distance family gives the best performance with very low 
error rates, followed by the LpDR and LpR metrics. 

5.4 Phase c) 

This phase is intended to analyze the combined fed-tree and CDT system per- 
formance in function of the number of pre-selected nearest neighbors k' . Several 
tests using PDL 5 distance function for an exhaustive search through a number 
of pre-selected prototypes between 25 and 500 were performed. For each fe' value 
tested, error rates for several fe values between 1 and 7 are provided too. The 
results are shown in Figure 1. 

For a given value of fe, the performance of the combined method approaches 
the exhaustive CDT rates as fe' increases. In practice, for a moderately large fe' 
the system performs as the exhaustive search. In the context of the experiments 
presented, good results are achieved from around 75 neighbors. Nevertheless, 
this is expected to be a function of the size of the training set, growing as the 
size of the training set increases. 

The recognition times per character (in a Pentium III, 800 MHz.) for the 
three methodologies presented were: 3.17 ms. using the fed-tree search, 7.78 ms. 
using fed-trees and CDT with fe'=100, and, 287.5 ms. using exhaustive CDT 
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nearest neighbors (k) 

Fig. 1. Results of phase c). For values of k' > 75, no difference with exhaustive search 
is found. 

Table 3. Error rates of the three methods for fc'=100, k=2>, and e=1.5. 



Method \ Distance 


LgR 


L9DR 


FDR 


PDFs 


Exhaustive CDT 


4.17 


3.93 


5.05 


3.69 (287 ms/char) 


fcd-tree (0^) & CDT 


4.17 


3.93 


5.01 


3.65 (7.78 ms/char) 


kd-tree (image) 


6.05 (3.69 ms/char) 


fcd-tree (0^) 


4.99 (3.17 ms/char) 



search. Further, the computation times for different values of k' in the combined 
methodology were directly proportional to the k' value used, and perfectly com- 
petitive in a real application. 

5.5 Summary 

In Table 3, a summary of results is shown. Error rates are presented for the best 
measure from each CDT-based family, obtained in phase b) -PDL 5 , LgDR, LgR 
and FDR, excluding the LpD distances- using a k value of 3 (the best one from 
phase c) ), and a value of k' = 100. Results for the kd-tree technique using image 
and 0^ features are also shown. A significant reduction of 1.06% can be achieved 
using the 0^ map as input features to the kd-tree instead of directly the image 
features (PCA to 40-D is always applied as the last feature extraction step). The 
error reduction reaches 2.4% when the combined methodology is applied using 
the Pixel distance with p=5. 

As shown in Figure 1, values of k' lower than 100 can be used when a higher 
recognition speed is needed, at the expense of a small increase of the error rate. 
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6 Conclusions 

A combined methodology using kd-trees and the Continuous Distance Transfor- 
mation is presented. In a first step, fast search using fcd-trees is applied to each 
observation in order to get a selected number of nearest prototypes. Then, ex- 
haustive search among the pre-selected prototypes using more complex measures, 
such as the CDT-based ones, is carried out to refine the result. The error rates 
obtained are equivalent to those from exhaustive search on the whole training 
set. The execution times reported are an order of magnitude lower than those of 
exhaustive search and only moderately higher than those of approximate kd-tree 
search methods, with a significantly lower error rates that clearly compensate 
for this small cost increase in most cases. 
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Abstract. An algorithm for the automatic identification of drawing 
tools based on the appearance of the stroke boundary is presented. The 
purpose of this stroke analysis is the determination of drawing tools 
in underdrawings - the basic concept of an artist - in ancient panel 
paintings. This information allows significant support for a systematic 
stylistic approach in the analysis of paintings. Up to now the identihca- 
tion of drawing tools is performed by an expert visually. Our tool will 
support the expert to investigate larger numbers of underdrawings, pro- 
vides objective and reproducible information and simplifies comparison 
of different underdrawings. Stroke analysis in paintings is related to the 
extraction and recognition of handwritings, therefore similar techniques 
to stroke analysis are used. Following the segmentation, the approxima- 
tion of the stroke boundary is done by active contours with different 
parameters. Deviations between a rigid and elastic “snake” are used as 
descriptive features for differentiation between drawing tools. Results of 
the algorithm are presented for sets of three different types of strokes. 



1 Introduction 

Computer aided analysis is an important tool for the examination of works of 
art [1]. Within an interdisciplinary project between the fields art history and im- 
age analysis we are developing a system to investigate infrared images (infrared 
reflectograms [2]) of medieval and Renaissance panel paintings with methods 
of digital image processing and analysis. In conservation and art history three 
prominent questions are of particular interest. The first question deals with the 
development of underdrawings and their relations to other drawings and between 
underdrawing and the covering painting. Secondly, art historians and restorers 
are interested in the style of the underdrawing, and questions whether the un- 
derdrawing is sketchy, freehand or a copy from a template. Finally an important 
question is, what kind of materials and drawing tools are used in an underdraw- 
ing [3]. 

This paper will contribute to answering the third question and addresses the 
automatic identification of drawing tools used in the underdrawing of medieval 
paintings. This is a first step towards a subsequent stylistic analysis of under- 
drawings and a classification of painters. We present a method that analyzes the 
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visual appearance of the stroke boundary. Up to now appearance based analysis 
of drawing tools is made only visually by experts. A problem of visual analysis 
is, that it is often not possible to inspect a large amount of drawings as a whole 
and in detail and therefore the analysis usually is reduced to selected objects 
and to certain regions of a painting. The restricted human optical retentive- 
ness further complicates the comparison of different underdrawings concerning 
drawing tools, drawing materials, and stroke characteristics. Automatic analysis 
tools will objectify the analysis. They will speed up the recognition process, will 
provide objective and reproducible data and support the experts in studying 
underdrawings . 




Fig. 1. Stroke details showing tools using fluid materials on the left, brush (a), quill 
(c), reed pen (e) and dry material tools on the right, black chalk (b), silver point (d) 
and graphite (f). 



The paper is organized as follows. In Section 2 an overview about the charac- 
teristics of drawing tools and materials used in medieval underdrawings is given. 
The analysis process, like the standard approach, is split into a segmentation 
step, a refinement step and a feature extraction step which are described in Sec- 
tion 3. Section 4 will present and discuss the results obtained with our method. 
Whereas Section 5 will give an brief overview about work in progress and on 
future work. 

2 Characterizing Drawing Tools / Materials 

Drawing tools, used in medieval panel paintings can be categorized into two dif- 
ferent types, into those that are fluid and into a group consisting of dry drawing 
material [3]. In Figure I six examples of a stroke for both of the groups are 
depicted. Three strokes represent the class of drawing tools using fluid materials 
(a,c,e) and three strokes represent dry materials (b,d,f). These examples have 
been taken from a panel prepared for our experiments by a restorer. 

Our analysis approach is based on the observation that prominent character- 
istics of drawn strokes are variations of shape and variations of the intensity in 
the drawing direction. Table 1 gives an overview about the characteristics of the 
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Fig. 2. Schematic diagram of our approach 



two groups of drawing tools. The first characteristic we analyzed is the boundary 
of a stroke. It can be observed, that there are variations in smoothness depend- 
ing on the drawing tool used. While strokes applied with a pen or brush using a 
fluid medium show a smoother boundary, the boundary of strokes applied with 
a dry material, e.g. black chalk or graphite is less smooth. This observation is 
used in the method presented in the next section. 

Table 1. Characteristics of different drawing tools and materials 



Tools /Materials 


Characteristics 


fluid materials 


fluid lines 


- paint or ink applied by 


- continuous and smooth 


pen or brush 


- vary in width and density 

- pooling of paint at the edges 

- droplet at the end 

- different endings (brush/pen) 


dry materials 


dry lines 


- charcoal 


- less continuous and smooth 


- chalks 


- less variation in width 


- metal points 

- graphite 


- more granular 



3 Stroke Segmentation and Feature Extraction 

Stroke segmentation in paintings is related to the extraction and recognition of 
handwritings [4] . Letters and words in Western languages and symbols or signs in 
Chinese or Japanese languages are built of manually drawn strokes or lines. Many 
approaches start with thresholding and thinning methods. While these methods 
are fast and save resources, valuable information for a more detailed analysis 
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(a) (b) (c) (d) 

Fig. 3. Segmentation and Refinement (a) cross sections and polygonal boundary (b) 
edgels with axis and gray value profiles (c) initial “top” and “bottom” boundary bound- 
ary (d) converged “rigid” (black) and “non-rigid” snakes (white-dashed) 

of strokes requires an approach that also incorporates boundary information 
[5]. We used Doermann’s segmentation algorithm in the Segmentation part of 
our approach, since it provides both, the boundary of a stroke and intensity 
profiles, which will be used to characterize strokes. Figure 2 gives an overview of 
our approach consisting of three basic steps, segmentation, boundary refinement 
and feature extraction. 

Segmentation. In the Step I, first edgels located at the stroke contour are 
detected by a Canny edge detector. Second, cross sections (perpendicular 
to the stroke boundary) are built as the connection form pairs of opposite 
edgels. To form a cross section the gradient vectors have to point to opposite 
directions. Finally, neighbored cross sections are linked into groups and rep- 
resent a stroke segment. Figure 3(a) shows the cross sections grouped into 
one stroke segment and the polygonal boundary. For further algorithmic 
details of we refer to [1]. 

Boundary refinement. In Step II the approximation of the stroke boundary 
by a closed polygon is refined by “snakes” , a method based on active contours 
[6]. After determining the principal component of the edgel distribution of 
a stroke segment, the contour is split into two sides (“top” and “bottom” 
boundary) that are treated separately. A set of gray value profiles, perpen- 
dicular to the axis, represent the domain for the snake algorithm. Figure 
3(b) shows the equidistant profiles in the original image, and rearranged as 
an image (c). The snake moves through this domain to minimize an energy 
functional determined by inner parameters controlling rigidity and tension 
of the snake and an external energy influenced by a gradient vector flow in 
order to provide accurate and fast convergence to boundary concavities. 
Feature extraction. Contour estimates with different levels of elasticity pro- 
vide descriptive information by means of deviation against each other. We 
used two succeeding snakes. The first rigid snake was initialized on the coarse 
contour estimate. The second, more elastic snake proceeds from this posi- 
tion. MEAN of the deviation and standard deviation (SDV) of the deviation 
between the two snakes are used as descriptive features. For more details 
please refer to [7]. 
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Fig. 4. The left column shows details from the test panel with strokes used in our 
experiments: brush strokes (873x729 pixel) (a), chalk strokes (992x631 pixel) (c) and 
graphite strokes (989x729) pixel) (e). The right column shows the detected boundaries 
of the segmentation step (the original images are displayed darker for better illustration 
of the boundaries 



4 Experimental Results and Discussion 

In our experiments we studied the differences of three types of drawing tools 
- brush, chalk and graphite. Test panels (21cm x 30cm) containing sets of the 
mentioned strokes have been prepared by a restorer. The test panels were dig- 
itized using a flat-bed scanner with an optical resolution of 1200 dpi. Details 
from images, as depicted in Figure 4 have been cropped manually. Figure 4 (a) 
shows a series of brush strokes, (c) chalk strokes and (e) graphite strokes, all 
applied in bottom up direction. 
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(b) 




Fig. 5. Details from the test panel showing stroke used in our experiments and the 
overlay of the snakes: brush strokes (a), chalk strokes (b) and graphite strokes (c). 



The result of the segmentation step is illustrated in Figure 4 (b),(d) and (f) 
respectively. The boundary of the stroke segments, that consist of at least 20 
cross sections are depicted. The segmentation algorithm works well for most of 
the brush strokes and graphite stokes. Problems arise e.g. at left stroke in Fig- 
ure 4(a), which is not segmented completely, since the stroke width parameter 
was set to narrow. The segmentation algorithm, still has problems with overlap- 
ping strokes like the “arrow top ” in the left most stroke of Figure 4(f) and (d). 
Problems occur with the chalk strokes in Figure 4(d) which are segmented into 
many small segments due to the inhomogeneity of the strokes. This necessitates 
a further processing step, that will be handled together with the overlapping 
problem. 
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Fig. 6. Standard deviation (SDV) and MEAN of the snake deviations. The deviations 
are measured on the “top” and “bottom” boundary of the individual brush, chalk and 
graphite strokes. 



For the refinement and feature extraction step, the stroke segments shown 
are used. First, the refinement step is initialized by the boundary of the seg- 
mentation step. The refinement algorithm, i.e. the adaptation of the two snakes 
with different rigidity, is applied separately to the “top” and “bottom” bound- 
ary of a stroke. Figure 5(a,b,c) shows three exemplary strokes together with an 
overlay of the more elastic (dotted bright line) and more rigid snake (underlying 
black line). It can be observed that the deviation of the rigid and elastic snake 
is smaller from the brush stroke then those from the black chalk and graphite 
strokes. 

To show the differences calculated, the SDV- and MEAN-values of the de- 
viations of the two snakes, i.e. two values, one for the “top” and one for the 
“bottom” boundary, are plotted in the diagram of Figure 6. The MEAN val- 
ues of the brush strokes (denoted as circles) are concentrated near zero, while 
there is a higher variation of the MEAN graphite strokes (denoted as “x”) and 
brush strokes (denoted as stars). Similarly, the standard deviation SDV of brush 
strokes is below 0.2 for all but two of the stroke borders. The SDV values for 
chalk and graphite is between 0.2 and 1.6 in our samples. So using the SDV 
feature will allow to distinguish between brush, i.e. a fluid drawing tool, and 
graphite and chalk respectively as dry drawing tools. Using a combination of 
SDV and MEAN the data of our samples can be used do differentiate between 
graphite and chalk, since most of the chalk values are positioned right and above 
the graphite values. 
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Still, this results are preliminary and experiments with more samples are 
necessary. Furthermore the reliability of this differentiation can be improved if a 
set of strokes is considered. As can be observed in underdrawings, certain regions 
of a drawing, a couple of strokes is applied with the same drawing tool, e.g. as 
hatches or cross hatches. 

5 Conclusion and Outlook 

The boundary analysis algorithm presented in this paper successfully detects 
and refines the boundary of strokes and extracts features that allow to differ- 
entiate between dry and fluid drawing tools. The first results showed, that the 
visual appearance of the boundary of a stroke can be used for discrimination. 
Still, further experiments with more samples are necessary to prove our method. 
The next steps will also incorporate additional features, like the texture of the 
different types of strokes, to get a measure for granularity of a stroke. Further 
we have noticed, that, in some cases, there is a difference between the “top” and 
“bottom” boundary of a stroke in dry drawing tools. This observation has to be 
proofed and evaluated. As reported, some problems occur in the segmentation 
step, if the strokes are interrupted. So one of our goals is to improve the robust- 
ness of the segmentation step and to extend the approach to segment overlapping 
and crossing stroke formations as e.g. reported in [8]. 
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Abstract. Global localization is the problem of determining the posi- 
tion of a robot under global uncertainty. This problem can be divided in 
two phases: 1) from the sensor data (or sensor view), determine clusters 
of hypotheses where the robot can be; and 2) devise a strategy by which 
the robot can correctly eliminate all but the right location. In the second 
phase, previous approaches consider an ideal robot, a robot with a per- 
fect odometer, to predict robot movements. This paper introduces a non 
deterministic prediction approach based on a Markov localization that 
include an uncertainty model for the movements of the robot. The non 
deterministic model can help to solve situations where a deterministic 
or ideal model fails. Hypotheses are clustered and a greedy search algo- 
rithm determines the robot movements to reduce the number of clusters 
of hypotheses. This approach is tested using a simulated mobile robot 
with promising results. 



1 Introduction 

Global localization is the problem of determining the location of the robot under 
global uncertainty. This problem arises, for example, when a robot uses a map 
that has been generated in a previous run, and it is not informed about its initial 
location within the map. 

The global localization problem can be seen as consisting of two phases: 
hypothesis generation and hypothesis elimination [4] . The first phase is to deter- 
mine the set of hypothetical locations H that are consistent with the sensing data 
obtained by the robot at its initial location. The second phase is to determine, 
in the case that H contains two or more hypotheses, which one is the true loca- 
tion of the robot, eliminating the incorrect hypotheses. Ideally, the robot should 
travel the minimum distance necessary to determine its exact location. 

This paper presents an approach to solve the global localization problem in 
a known indoor environment modeled by an occupancy grid map, a two dimen- 
sional map where the environment is divided in square regions or cells of the same 
size. This approach is an improved version of the global localization approach 
given in [10]. We use a Markov localization (see [7,10]) in both phases, to repre- 
sent and update the set H of hypotheses, and predict movements of the robot in 
order to eliminate hypotheses. The main contribution of this paper is to predict 
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movements using a Markov localization that includes an uncertainty model for 
the movements of the robot. Previous approaches [8,5,10] only consider a deter- 
minist or ideal model for the robot (a robot with a perfect odometer) during the 
prediction process. 

The rest of this paper is organized as follows. Section 2 describes relevant 
issues of our approach to generate hypotheses [10]. Section 3 presents the frame- 
work of Markov localization. Section 4 explains our approach to eliminate hy- 
potheses. Experimental results using a mobile robot simulator are shown in Sec- 
tion 5. We choose a simulator because it is easy to create complex environments, 
with many similar places, to test the robustness of our approach. Finally, some 
conclusions are given in Section 6. 

2 Hypotheses Generation 

In this section a simple occupancy grid map is used as an example to show the 
ideas behind the proposed approach. Figure 1 (a) shows this simple map built 
using a mobile robot simulator. Figure 1 (b) shows a local map view, extracted 
from the map for the position of the robot shown in (a), considering that the 
robot direction is aligned with a global fixed direction (pointing downwards 
in this case). Figure 1 (c) shows the sensor view (generated by the simulator) 
considering that the robot is aligned with the global direction. Both views have 
an angular resolution of 0.5 degrees, a common value found in laser range sensors, 
and include the robot position for reference as a black cell. Perceptual limitations 
are taken into account setting a maximum range of 3 meters. The problem consist 
in estimating the set H of possible locations that have local map views consistent 
with the sensor view. 

2.1 Polar Correlation 

To have a simple model of robot motion and hence a small state space in the 
Markov localization, we assume that the robot should be in one of 8 possible 
directions {9i = 45 * i degrees, i = 0,..7), with respect to the global fixed di- 
rection, one for each adjacent cell. A polar correlation, using a sum of absolute 




Fig. 1. A simple environment. From left to right: (a) Occupancy grid map. (6) Local 
map view computed from the map and the robot location showed in (a), (c) Actual 



sensor view 
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Fig. 2. Correlation results. Angular displacements are from 0 (left) to 359.5 degrees 
(right) 



differences, can be used to find the match between a local map view and the 
sensor view. Figure 2 shows the correlation results for all the possible angular 
displacements of the sensor view against the local map view shown in Figure 1. 
From the minimum difference an angular displacement can be computed to align 
the robot with one of the directions 0^. Obviously, the right angular displace- 
ment should be indicated considering the most probable position of the robot. 
In the case of Figure 2 the angular displacement corresponds to —21 degres, and 
the best estimated direction is 270 degrees. As the Markov localization needs a 
probabilistic value p{s\l) of perceiving a sensor view s given that the robot is at 
location I = {< x,y >,9i), a difference value d(s, w(Z)) can be computed from 
the correlation results between the sensor view s and the local map view at the 
cell < x,y >, denoted by v{l), and then a probabilistic value can be obtained 
from d(s,v{l)). We compute d{s,v{l)) as the minimum difference (in the corre- 
lation results) for an angular interval with center at 0i. The desired probability 
is computed by p{s\l) = where a is a positive real number. 

Given that this procedure is expensive, the next section shows a fast way to 
find a small set of candidate cells to apply this procedure, instead of all the free 
cells in the map. 

2.2 Roadmap 

Following the ideas described in [9], the set of possible cells where the robot is 
allowed to move, tries to keep a fixed distance k to obstacles. Figure 3 (a) shows 
the full set of free cells where the robot can be as white pixels, while Figure 
3 (b) shows the cells that form the roadmap. There is a significant reduction 
in the number of cells. To get a robust procedure, our approach considers a 
thick roadmap (see Fig. 3 (c)) which include the cells in the neighborhood of the 
thin roadmap (see Fig. 3 (b)). The idea is to use the thin roadmap to predict 
movements of the robot (in the case of more than one group of hypotheses) 
and to use the thick roadmap to restrict the possible locations where the robot 
can be. The following section describes the process to update the probability of 
hypotheses after the robot senses or moves. 

3 Markov Localization 

Following [5], the key idea of Markov localization is to compute a probability 
distribution over all possible locations in the environment. p{Lt = 1) denotes the 
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Fig. 3. A roadmap. From left to right: (a) Full set of free cells, (b) Thin roadmap for 
fc = 1 m. (c) Thick roadmap 



probability of finding the robot at location I at time t. Here, I is a location in 
X — y — 0i space where x and y are Cartesian coordinates of cells and 6i is a 
valid orientation. p{Lq) reflects the initial state of knowledge and it is uniformly 
distributed to reflect the global uncertainty. p{Lt) is updated whenever: 

1. The robot moves. Robot motion is modeled by a conditional probability, 
denoted by Pa{l\l')- Pa{i\l') denotes the probability that motion action a, 
when executed at I', carries the robot to 1. Pa{l\i') is used to update the 
belief upon robot motion: 



p{Lt+i 



^ Sc Pa{¥)p{Lt = 1) 
> ■■■ p{s) 



( 1 ) 



Here p{s) is a normalizer that ensures that p(Lt+i) sums up to 1 over all 1. 
An example ofpa{l\l') is shown in Figure 4, considering that the robot moves 
forward in the thick roadmap, and the orientation of the robot is aligned to 
one of the possible 8 directions 9i. Circles denote grid cells and the most 
probable transition in Figure 4 is labeled with 4/10. 

2. The robot senses. When sensing s. 



p{Lt+i 



p{s\l)p{Lt = 1) 
> p{s) 



( 2 ) 



Here p{s\l) is the probability of perceiving s at location 1. In order to get an 
efficient procedure to update the probability distribution, cells with proba- 
bility below some threshold u are set to zero. 





1/10 



Fig. 4. Robot motion for one direction 
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4 Hypothesis Elimination 

Locations, I, with p{Lt = 1) > u are clustered into n clusters or groups according 
to their location I using AutoClass [3], a Bayesian clustering technique. 

The idea to eliminate hypotheses is to move the robot through the thin 
roadmap trying to reduce the number of clusters. 

To get an efficient procedure, our approach considers that the mobile robot is 
at the most probable location and then considers all the cells of the thin roadmap 
as valid movements for the robot. If we assign to locations of the thin roadmap 
the number of possible groups of hypotheses, a good movement to eliminate 
hypotheses is to direct the robot toward the nearest cell with less than n groups. 
Let this cell be called the goal cell. 

To compute the similarity between local map view associated to two cells of 
the thin roadmap we introduce a similarity matrix. Let Cj, (i = 1, ..,m) denote 
the m cells of the thin roadmap, and sim{ci,Cj) be the similarity between the 
local map views associated to cells and Cj. A similarity measure sim can be 
computed using the correlation technique previously presented, 

sim{ci, Cj) = maXi^o^„^7{P{s = v{c^)\l = (cj,9i))} (3) 

sim{ci, Cj) for all i,j = 1, ..., m form a similarity matrix S that can be computed 
from the map and the roadmap, before the localization process starts. 

If there are more than one group of hypotheses, we can predict a robot 
movement in two different ways: 1) using an ideal model for the robot movements 
and 2) using a model that include uncertainty. Let these types of prediction be 
called deterministic and non deterministic prediction respectively. 

In both cases we use a Markov localization representation p'{L), to track 
groups of hypotheses under the possible set of virtual movements of the robot; 
and p{L) to represent the set of hypotheses of the location of the robot, given the 
set of real movements. When a prediction process starts, p'{L) and p{L) are the 
same. Once a goal cell in p'{L) is computed, assuming that the robot location 
is given by the most probable hypothesis, the robot can move towards the goal 
cell, updating p{L). The prediction process is repeated if there are more than 
one group of hypotheses, until there is only one group. 

4.1 Deterministic Prediction 

If the movements of the robot are considered deterministic or ideal, they can 
be represented like a rotation followed by a translation [5]. Let Cf, be the most 
probable location where the robot can be. After a given virtual movement v from 
Cb to a cell Ci of the thin roadmap, the transformation given by v can be applied 
to all locations of the probability distribution. Let Cj be the new position for 
one hypothesis, after transformation v. After this virtual movement, a virtual 
sensing is applied. Here we use the similarity matrix to estimate p{s\l), assuming 
that the robot is at location I =< Cj,9j >, and that s corresponds to the local 
map view from cell Ci (the most probable): 
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mi)) 



sim(ci,Cj) if Cj € thick roadmap 
0 otherwise 



where Cj is the cell of the thin roadmap closest to Cj . 



( 4 ) 



4.2 Non Deterministic Prediction 

An improvement over the deterministic prediction is to include an uncertainty 
model for the virtual movements of the robot (e.g. the model illustrated in Figure 
4). A virtual movement from a cell Cb to Ci (as in the deterministic prediction) is 
split in a sequence of < (m-i, si), (m2, S2)... >, where m^ indicates a motion step 
to an adjacent cell, and Si is a sensing step. In other words, instead of considering 
only a target cell where the robot can be, there will be a set of cells (limited by 
the trimming process after the sensing step). The sensing step is the same as in 
the deterministic prediction. 

Considering that the thin roadmap is usually of one or two cells wide, re- 
sults from an adjacent cell can be used to compute further results, giving a fast 
algorithm. In the implementation we use a breadth first search over the thin 
roadmap. 



5 Experimental Results 

This section presents preliminary results obtained using a mobile robot simula- 
tor. The robot simulates sonars and a low cost laser range sensor, implemented 
with a laser line generator and a camera. The laser sensor gives good measure- 
ments within a range of 3 m. The simulated robot has an uniform random error 
on displacements of ±10% and ±5% on rotations. We present two experiments 
to test the deterministic and non deterministic prediction for two complex envi- 
ronments. 




Figure 5 shows a complex simulated environment of 17.5 x 10 m. At the 
beginning there were 6 groups of hypotheses, one per room except in the two 
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rightmost rooms. After moving through 5 cells (of 10 x 10 cm) towards the pre- 
dicted goal cell, a sensing step is inserted and the prediction process is repeated 
until there is only a group of hypotheses. At the end of the path followed by 
the robot, there is only a group of hypotheses, the right one. In this case, both 
types of predictions, deterministic and non deterministic, solved the localization 
problem and lead to similar paths. 

Figure 6 shows another simulated environment of 12 x 15.5 m where the 
corridor on the left is slightly longer than corridor on the right. At the begin- 
ning there were 2 groups of hypotheses, Cq and C\. The deterministic prediction 
computes a goal cell Pd near the intersection of corridors, while the non deter- 
ministic prediction indicates a cell Pnd in the bottom part of the roadmap. In 
this case, the non deterministic prediction solves the localization problem and 
the deterministic prediction fails. The deterministic prediction is faster (2 sec- 
onds versus 6 seconds on a PC Pentium III 733Mhz) but it fails to solve the 
global localization problem. 




Fig. 6. (left) The simulator, (right) Roadmap with clusters Co and Ci and results from 
the deterministic (Pd) and non deterministic prediction {Pnd) 



6 Conclusions 

A robust approach to solve the global localization problem in indoor environ- 
ments has been presented. It can be seen as the application of two Markov 
localization representations: one to track probable locations of the robot; an an- 
other to predict movements of the robot when there is more than one group of 
hypotheses. The second Markov representation can use a deterministic or a non 
determinist model for the movements of the robot. As the experiments confirm, 
a non deterministic prediction is more robust than a deterministic one, specially 
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when the odometer of the robot or its sensors are not very accurate, or there are 
long corridors in the environment. In these cases the non deterministic prediction 
(using a model for the uncertainty of the robot), succeeds while a determinist 
prediction (modeled by a single rotation followed by a translation) can fail. 

In the future, we plan to test this approach using real robots and environ- 
ments with long corridors and similar places. 
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Abstract. The problem of imbalanced training data in supervised 
methods is currently receiving growing attention. Imbalanced data means 
that one class is much more represented than the others in the training 
sample. It has been observed that this situation, which arises in sev- 
eral practical domains, may produce an important deterioration of the 
classification accuracy, in particular with patterns belonging to the less 
represented classes. In the present paper, we report experimental results 
that point at the convenience of correctly downsizing the majority class 
while simultaneously increasing the size of the minority one in order to 
balance both classes. This is obtained by applying a modification of the 
previously proposed Decontamination methodology. Combination of this 
proposal with the employment of a weighted distance function is also 
explored. 



1 Introduction 

Design of supervised pattern recognition methods is usually based on a training 
sample (TS): a collection of examples previously analyzed by a human expert. 
Performance of the resulting classification system depends on the quantity and 
the quality of the information contained in the TS. Recently, concern has arisen 
about the complications produced by imbalance in the TS. A TS is said to 
be imbalanced when one of the classes (the minority one) is heavily under- 
represented in comparison to the other (the majority) class. For simplicity, and 
consistently with the common practice [7,13], we consider here only two-class 
problems. It has been observed that imbalanced training samples may cause a 
significant deterioration in the performance attainable by standard supervised 
methods. High imbalance occurs in real-world domains where the decision system 
is aimed to detect a rare but important case, such as fraudulent telephone calls 
[9], oil spills in satellite images of the sea surface [12], an infrequent disease [17], 
or text categorization [14]. 

* Work partially supported by grants 32016-A (Mexican CONAGyT), 744.99-P (Mexi- 
can Cosnet), TIC2000-1703-C03-03 (Spanish CICYT) and P1-1B2002-07 (Fundacio 
Caixa Castello-Bancaixa). 
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Most of the attempts for dealing with this problem can be categorized as [7]: 

a) Over-sampling the minority class to match the size of the other class [5] . 

b) Downsizing the majority class so as to match the size of the other class [13]. 

c) Internally biasing the discrimination based process so as to compensate for 

the class imbalance [8,12]. 

As pointed out by many authors, overall accuracy is not the best criterion to 
assess the classifier’s performance in imbalanced domains. For instance, consider 
a practical application where only 2% of the patterns belong to the minority 
class. In such a situation, labeling all new patterns as members of the majority 
class would give an accuracy of 98%. Obviously, this kind of system would be 
useless. Consequently, other criteria have been proposed. One of the most widely 
accepted criterion is the geometric mean, g = (a+ • where a’*' is the 

accuracy on cases from the minority class and a~ is the accuracy on cases from 
the majority one [13]. This measure tries to maximize the accuracy on each of 
the two classes while keeping these accuracies balanced. 

In an earlier study [4], we provide preliminary results of several techniques 
addressing the class imbalance problem. In such a work, we focused on under- 
sampling the majority class and also on internally biasing the discrimination 
process, as well as on a combination of both approaches. In the present paper, 
we introduce a new proposal for balancing the TS through reduction of the ma- 
jority class size and, at the same time, an increase in the amount of prototypes 
in the minority class. To this aim, we employ a modification of the Decontami- 
nation methodology [2] that will be referred to as Restricted Decontamination. 
We also explore the convenience of using this technique in combination with a 
weighted distance measure aimed at biasing the classification procedure. These 
ideas are evaluated over four real datasets using the Nearest Neighbor (NN) rule 
for classification and the geometric mean as the performance measure. 

The NN rule is one of the oldest and better-known algorithms for performing 
supervised nonparametric classification. The entire TS is stored in the computer 
memory. To classify a new pattern, its distance to each one of the stored training 
patterns is computed. The new pattern is then assigned to the class represented 
by its nearest neighboring training pattern. Performance of NN rule, as with any 
nonparametric method, is extremely sensitive to incorrectness or imperfections 
in the TS. Nevertheless, the NN rule is very popular because of its characteristics: 
a) conceptual simplicity, b) easy implementation, c) known error rate bounds, 
and d) potentiality to compete favorably in accuracy with other classification 
methods in real data applications. 

2 Related Works 

The two basic methods for resampling the TS cause the class distribution to 
become more balanced. Nevertheless, both strategies have shown important 
drawbacks. Under-sampling may throw out potentially useful data, while over- 
sampling increases the TS size and hence the time to train a classifier. In the 
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last years, research has focused on improving these basic methods. Kubat and 
Matwin [13] proposed an under-sampling technique that is aimed at removing 
those majority prototypes that are “redundant” or that “border” the minority 
instances. They assume that these bordering cases are noisy examples. However, 
they do not use any of the well-known techniques for cleaning the TS. 

Chawla et al. [5] proposed a technique for over-sampling the minority class. 
Instead of merely replicating prototypes of the minority class, they form new 
minority “synthetic” instances. This is done by taking each minority class in- 
stance and creating synthetic instances along the line segments joining any/all 
of the k minority class nearest neighbors. 

Barandela et al. [3] explore the convenience of designing a multiple classi- 
fication system for working in imbalanced situations. Instead of using a single 
classifier, an ensemble is implemented. The idea is to train each one of the indi- 
vidual components of the ensemble with a balanced TS. In order to achieve this, 
each individual component of the ensemble is trained with a subset of the TS. 
As many subsets of the TS as required to get balanced subsets are generated. 
The number of subsets is determined by the difference between the amount of 
prototypes from the majority class and that of the minority class. 

Pazzani et al. [15] take a slightly different approach when learning from 
an imbalanced TS by assigning different weights to prototypes of the different 
classes. On the other hand, Ezawa et al. [8] bias the classifier in favor of certain 
feature relationships. Kubat et al. [12] use some counter-examples to bias the 
recognition process. 



3 Proposed Strategies 

In several practical applications, class identification of prototypes is a difficult 
and costly task. There is another source of distortion in the training data: pro- 
totypes with errors in some attribute values and instances that are atypical or 
exceptional. Generalization accuracy of the supervised method may be degraded 
by the presence of incorrectness or imperfections in the TS. Particularly sensi- 
tive to these facts are nonparametric classifiers whose training is not based upon 
any assumption about probability density functions. This explains the emphasis 
given to the evaluation of procedures used to collect and to clean the TS. 

In a previous work [2], a methodology for correcting a TS while employing 
nonparametric classifiers has been presented. The Decontamination procedure 
can be regarded as a cleaning process removing some elements of the TS and 
correcting the label of several others while retaining them. Experimental results 
with both simulated and real datasets have shown that the Decontamination 
methodology allows to cope with all types of imperfections (mislabeled, noisy, 
atypical or exceptional) in the TS, improving the classifier’s performance and 
lowering its computational burden. The Decontamination methodology is based 
on two previously published editing techniques. 
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3.1 Basic Editing Techniques 

Editing techniques are mainly aimed at improving the performance of the NN 
rule by filtering the training prototypes. As a byproduct, they also obtain a 
decrease in the TS size and, consequently, a reduction of the computational cost 
of the classification method. The first work of editing corresponds to Wilson [16] 
and several others have followed. 



Wilson’s Editing procedure. This technique consists of applying the A:-NN 
(fc > 1) classifier to estimate the class label of every prototype in the TS. Those 
instances whose class label does not agree with the class associated to the ma- 
jority of the k neighbors are discarded. The procedure is: 

1. Let S = X (A is the original TS and S will be the edited TS) 

2. For each a; in A do: 

a) Find the k nearest neighbors of a; in A — {a:} 

b) Discard x from S if its label disagrees with the class associated with the 
largest number of the k neighbors. 



Generalized Editing (GE: Koplowitz and Brown [11]). This is a modifi- 
cation of the Wilson’s algorithm. Out of concern with the possibility of too many 
prototypes being removed from the TS because of Wilson’s editing procedure, 
this approach consists of removing some suspicious prototypes and to change 
the class labels of some other instances. Accordingly, it can be regarded as a 
technique for modifying the structure of the TS (through re-labeling of some 
prototypes and not only for eliminating atypical instances). In GE, two param- 
eters have to be defined: k and k' in such a way that (fc -I- l)/2 < fc' < fc. This 
editing algorithm can be written as follows: 

1. Let S = X (A is the original training set and S will be the processed TS) 

2. For each a; in A do: 

a) Find the fc nearest neighbors of x in A — {x}. 

b) If a class has at least fc' representatives among those fc neighbors, then 
label X according to that class (independently of its original class label). 
Otherwise, discard it from S. 



3.2 The Decontamination Methodology in Brief 

The Decontamination methodology involves several applications of the GE tech- 
nique, followed by the employment, also repeatedly, of the Wilson’s Editing 
algorithm. Repetition in the application of each one of these techniques stops if 
one of the following criteria is fulfilled: 

1 . Stability in the structure of the TS has been reached (no more removals and 
no more re-labeling). 

2. Estimate of the misclassification rate (leave-one-out method; see [10]) has 
begun to increase. 
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3. One class has resulted emptied (all its representatives in the TS have been 
removed or transferred to another class) or has resulted with too few proto- 
types (less than five training instances for each attribute). 



3.3 Proposed Modification of the Decontamination Methodology 

In the present paper, we present a modification of the Decontamination method- 
ology: the Restricted Decontamination. In this restricted way, the Decontami- 
nation process is applied only to the majority class. That is, changes of label or 
removal from the TS affect only to those prototypes representing the majority 
class. In this way, a decrease in the amount of prototypes of the majority class 
is obtained. At the same time, some prototypes, originally in the majority class, 
are incorporated (by changing their labels) to the minority class, increasing the 
size of this latter class. 

In the present work. Restricted Decontamination is employed for the first 
time for handling imbalance. This restricted procedure was initially designed to 
handle situations when information about the particular application area could 
imply existence of contamination in only some of the classes [1] . The source for 
this information could be given by some characteristics of the process used to 
collect the TS or by the intrinsic nature of the problem at hand. 



3.4 The Weighted Distance Function 

As a technique for internally biasing the discrimination procedure, we have ex- 
perimented with a modification of the Euclidean metric that can be regarded as 
a weighted distance function [4]. With this modification, when classification of 
a new pattern y is attempted, and in the search through the TS of its nearest 
neighbor, the following quantity must be computed for each training instance x: 

dw{y,x) = (rii/ny^'^dEiy.x) 

where i refers to the class of instance x, rii is the number of training patterns 
from this class, n is the TS size, m is the dimensionality of the feature space and 
c?£;(-) is the Euclidean metric. 

The idea behind this distance proposal is to compensate for the imbalance in 
the TS without actually altering the imbalance. Weights are assigned, unlike in 
the usual weighted /c-NN rule proposals, to the respective classes and not to the 
individual prototypes. In that way, since the weighting factor is greater for the 
majority class than for the minority one, distance values to training instances 
of the minority class are much more reduced than the distance values to the 
training examples of the majority class. This produces a tendency for the new 
patterns to find their nearest neighbor among the cases of the minority class, 
increasing the accuracy in that class. 
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Table 1. Characterization of the datasets employed in the experiments 



Datasets [Attributes Training Sample Test Sample 
class 1 class 2 class 1 class 2 



Phoneme] 


5 


1268 


3054 


318 


764 


Satimage 


36 


500 


4647 


126 


1162 


Glass 


9 


24 


150 


5 


35 


Vehicle 


18 


170 


508 


42 


126 



4 Experimental Results 

The Restricted Decontamination proposal, and its combination with the 
weighted distance in the classification stage, are assessed through experiments 
carried out with four real datasets taken from the UCI Database Repository [6]. 
In each dataset, five-fold cross validation was employed (80% for the TS and 
20% for a test set). Results to be presented hereafter represent the averaged 
values of the five replications. To facilitate comparison with other published re- 
sults [13], in the Glass set the problem was transformed for discriminating class 
7 against all the other classes, and in the Vehicle dataset the task is to classify 
class 1 against all the others. Satimage dataset was also mapped to configure a 
two-class case, the training patterns of classes 1, 2, 3, 5, and 6 were joined to 
form a unique class and the original class 4 was left as the minority one. These 
modified datasets are described in Table 1. As can be seen, now class 2 is the 
majority class and class 1 is the minority one. 

The results are shown in Table 2. The average g values obtained when clas- 
sifying with the original TSs, and with these TSs after we have processed them 
with the idea of Kubat and Matwin [13], are also included for comparison pur- 
poses. For a better illustration, results produced by the usual Decontamination 
procedure [2] are reported too. The Restricted Decontamination proposed here 
yields an improvement in performance (as measured by the g criterion) , in com- 
parison to all the other methods. This improvement is more remarkable when the 
weighted distance is employed for classifying new patterns. It is also important 
to note that the results from the procedure of Kubat and Matwin are excelled in 
all datasets. The usual Decontamination methodology has been shown to pro- 
duce important benefits [2] when considering the general accuracy, but it is not 
convenient in those cases when imbalance in the TS is present. 



Table 2. Averaged mean values (and standard deviations) of the g criterion 



Procedure 


Phoneme Satimage Glass Vehicle 


Original TS 


73.8 


70.9 


86.7 


56.0 


Decontamination & Euclidean classif. 


69.6 


67.3 


84.6 


46.8 


Restricted Decontam. &: Euclidean classif. 


73.8 


75.4 


86.2 


66.4 


Decontamination & Weighted classif 


73.6 


68.9 


84.6 


49.7 


Restricted Decontam. &: Weighted classif. 


74.6 


77.4 


87.9 


66.3 


Kubat and Matwin 


68.3 


72.9 


79.0 


65.4 
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The effects of the Restricted Decontamination can be better analyzed by 
considering the balance obtained in the TS after its application (see Table 3). 
Results in this table indicate a decrease in the size of the majority class (number 
2), while the minority class (number 1) size is increased. On the other hand, the 
usual Decontamination procedure deteriorates the imbalance in the TS, when 
compared with the original TS. The proposal of Kubat and Matwin, by aggres- 
sively under-sampling the majority class, produces an imbalance in the other 
direction, very remarkable in Phoneme and Glass datasets. 



Table 3. Percentage of patterns in each class 



Procedure 


Phoneme 
class 1 class 2 


Satimage 
class 1 class 2 


Glass 

class 1 class 2 


Vehicle 

class 1 class 2 


Original TS 


29.34 


70.66 


9.71 


90.29 


13.79 


86.21 


25.07 


74.93 


Decontamination 


25.68 


74.32 


10.04 


89.96 


11.68 


88.32 


15.24 


84.76 


Restricted Decont. 


35.04 


64.98 


15.09 


84.91 


15.06 


84.94 


43.97 


56.03 


Kubat and Matwin 


85.76 


14.24 


52.40 


47.60 


75.00 


25.00 


57.75 


42.25 



5 Concluding Remarks 

In many real-world applications, supervised pattern recognition methods have 
to cope with highly imbalanced TSs. Traditional learning systems such as the 
NN rule can be misled when applied to such practical problems. This effect 
can become moderate by using a procedure that allows to under-sample the 
majority class while over-sampling the minority class. In this direction, a new 
approach has been proposed in this paper. The Restricted Decontamination has 
been shown to improve the balance in the TS. Classification with the weighted 
distance, after preprocessing the TS with the Restricted Decontamination has 
produced important progress in the resulting g value, when compared with the 
original TS. These results have also excelled those obtained by the proposal of 
Kubat and Matwin. This can be explained because the proposal of Kubat and 
Matwin is based upon techniques for eliminating redundant instances and not 
for cleaning the TS from noisy or atypical prototypes. 

Benefits of the proposal are shown even in the Glass dataset. This dataset 
suffers not only of the imbalance problem, but also the minority class is too 
small. Adequacy of the TS size must be measured by considering the number of 
prototypes in the smaller class and not in the whole TS. For the minority class 
in Glass dataset, the size/dimensionality ratio is very low: only 2.7 instances 
for each attribute. Restricted Decontamination and weighted distance have been 
able to handle this critical situation. 

A more extensive research is currently being conducted to explore all the 
issues linked to the imbalanced TSs. At present, we are studying the convenience 
of applying genetic algorithms to reach a better balance among classes. We are 
also experimenting in situations with more than two classes, as well as doing 
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some research about the convenience of using these procedures to obtain a better 

performance with other classifiers, such as the neural networks models. 
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Abstract. In this paper we address the problem of estimating the pa- 
rameters of a Gaussian mixture model. Although the EM (Expectation- 
Maximization) algorithm yields the maximum-likelihood solution it has 
many problems: (i) it requires a careful initialization of the parameters; 
(ii) the optimal number of kernels in the mixture may be unknown before- 
hand. We propose a criterion based on the entropy of the pdf (probability 
density function) associated to each kernel to measure the quality of a 
given mixture model, and a modification of the classical EM algorithm to 
find the optimal number of kernels in the mixture. We test this method 
with synthetic and real data and compare the results with those obtained 
with the classical EM with a fixed number of kernels. 



1 Introduction 

Gaussian mixture models, have been widely used in the field of statistical pattern 
recognition. One of the most common methods for fitting mixtures to data is the 
EM algorithm [4]. However, this algorithm is prone to initialization errors and, in 
these conditions, it may converge to local maxima of the log-likelihood function. 
In addition, the algorithm requires that the number of elements (kernels) in the 
mixture is known beforehand. For a given number of kernels, the EM algorithm 
yields a maximum-likelihood solution but this does not ensure that pdf of the 
data (multi-dimensional patterns) is properly estimated. A maximum-likelihood 
criterion with respect to the number of kernels is not useful because it tends to 
use a kernel to describe each pattern. 

The so called model-selection problem has been addressed in many ways. 
Some approaches start with a few number of kernels and add new kernels 
when necessary. For instance, in [14], the kurtosis is used as a measure of non- 
Gaussianity yielding a test for splitting a kernel in one-dimensional data. In 
[15] this method is extended to the multi-dimensional case. This approach has 
same drawbacks, because kurtosis can be very sensitive to outliers. In [16] it is 
proposed a greedy method, which performs a global search in combination with 
another local search whenever a new kernel is added. 

A. Sanfeliu and J. Ruiz-Shulcloper (Eds.): CIARP 2003, LNCS 2905, pp. 432-439, 2003. 

© Springer- Verlag Berlin Heidelberg 2003 
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Other model-selection methods start with a high number of kernels and pro- 
ceed to fuse them. In [5] [6], the EM algorithm is initialized with many ker- 
nels randomly placed and then the Minimum-description length principle [9] 
is applied to iteratively remove some of the kernels until the optimal number 
of them is found. In [II], the proposed algorithm is allowed both to split and 
fuse kernels. Kernel fusion arises when many patterns have the same posterior 
probability and splitting is driven by the Kullback-Leibler divergence between a 
component density and empirical density in the neighborhood of the component. 
In this approach, the number of components remains unchanged. 

In this paper we propose a method that starting with few kernels, typically 
one, find the maximum-likelihood solution. Then it tests whether the underlying 
pdf of each kernel is Gaussian and otherwise it replaces that kernel with two 
kernels adequately separated from each other. In order to detect non-Gaussianity 
we compare the entropy of the underlying pdf with the theoretical entropy of 
a Gaussian. After two new kernels are introduced, our method performs several 
steps of partial EM in order to obtain a new maximum-likelihood solution. 

2 Gaussian-Mixture Models 

A d-dimensional random variable y follows a finite-mixture distribution when its 
pdf p{y\0) can be described by a weighted sum of known pdf’s named kernels. 
When all these kernels are Gaussian, the mixture is named in the same way: 

K K 

p(y|6*) = X! where 0 < tt^ < 1, i = l, ..., K, and ^ TTi = 1, (1) 

i—1 i—1 

being K the number of kernels, 7Ti,...,7rfc the a priori probabilities of each 
kernel, and Oi the parameters describing the kernel. In Gaussian mixtures, 
0i = {/ij, Ai}, that is, the average vector and the covariance matrix. 

The set of parameters of a given mixture is 0 = {0i, ..., 0k, Ob- 

taining the optimal set of parameters 0* is usually posed in terms of maximizing 
the log-likelihood of the pdf to be estimated: 

N N K 

^(y|0) =logp(F|0) = \og'\\p(yn\0) = ^log^7rfcp(?/fc|0fc). (2) 

n—1 n—1 k—1 

0* = argm^^(0). (3) 

where Y = {j/i, ...j/at} is a set of N i.i.d. samples of the variable Y. 

2.1 EM Algorithm 

The EM (Expectation-Maximization) algorithm [4] is an iterative procedure that 
allows us to find maximum-likelihood solutions to problems involving hidden 
variables. The EM algorithm generates a sequence of estimations of parameters 
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= 1,2,...} by alternating an expectation step and the maximization 
step until convergence. In the case of mixtures [8], the hidden variable can be 
regarded as the kernel each data has been sampled from. The E-step estimates 
the posterior probability that the data y„ was sampled with the kernel k: 



PikWn) = '^kP{y^'^^\k)/Sf=iTrjp{y^'^'>\k) 
In M-step, the new parameters 0*{t + 1) are given by: 



1 ^ 



, Pk = 



n—1 



T,n=lP(Myn)yn 

EtiP(k\yn) 



and pk = 



T,n=lP(k\yn)yn 

EtiPik\yJ 



( 4 ) 



( 5 ) 



A detailed description of this classic algorithm is given in [8]. Here we focus 
on the fact that if K is unknown beforehand it cannot be estimated through 
maximizing the log-likelihood because £{0) grows with K. Fig. 1 shows the 
effect of using only a kernel, in classical EM algorithm with fixed number of 
kernels, to describe two Gaussian distributions: density is underestimated giving 
a poor description of the data. In the next section we describe the use of entropy 
to test whether a given kernel properly describes the underlying data. 




Fig. 1. Classic EM algorithm, fits erroneously data of a bimodal distribution (with 
averages fii = [0, 0] y ^t 2 = [3, 2]) (left) to a Gaussian with jj. = [1.5, 1] (right). 



3 Entropy Estimation 

Entropy is a basic concept in information theory. The entropy of a given variable 
Y can be interpreted in terms of information, randomness, dispersion, and so on 
[3] [10]. For a discrete variable we have: 

N 

H{Y) = -Ey[\og{P{Y))] = -Y,P{Y = yi)\ogp{Y = y,). (6) 

i=l 

where ?/i, ..., yjv is the set of values of variable Y . A fundamental result of infor- 
mation theory is that Gaussian variables have the maximum entropy among all 
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the variables with equal variance. Consequently the entropy of the underlying 
distribution of a kernel should reach a maximum when such a distribution is 
Gaussian. This theoretical maximum entropy is given by: 

H{Y) = hog[{27rer\S\]. (7) 

Then, in order to decide whether a given kernel is truly Gaussian or must be 
replaced by two other kernels, we compare the estimated entropy of the underly- 
ing data with the entropy of a Gaussian. However, one of the main problems of 
this approach is that we must estimate, in principle, the pdf given a few samples 
[12][13][17]. 



3.1 Entropy Estimation with Parzen’s Windows 

The Parzen’s windows approach [7] is a non-parametric method for estimating 
pdf’s for a finite set of patterns. The general form of these pdf’s using a Gaussian 
kernel and assuming diagonal covariance matrix ip = Diag{a1, ...tr^^) is: 



P*{Y,a) 



1 



E 

Va&a 



1 



d 



J]^exp 

J=1 




(8) 



where a is a sample of the variable Y, Na is the size of the sample, represents 
the j-th component of y and represents the j-th component of kernel ya- 
In [12] it is proposed a method for adjusting the widths of the kernels using 
maximum likelihood. Given the definition of entropy in Equation 6, we have: 

H,{Y) = -E,[log{P{Y))] = E = -^logW&)), (9) 

Vb&b 



where i{b) is the likelihood of the data. As maximizing likelihood is equivalent 
to minimice entropy, this approach consists of estimating the derivative of en- 
tropy with respect to the widths of the kernels, and performs a gradient descent 
towards the optimal widths: 



d 

dcjd 



H*{Y) 



1 \ " \ ^ K^{yd — ya) ~ _ 1 

itb iXa Sy. Ga {yb-ya)\<yd)\ ctI 



( 10 ) 



being the standard deviation in each dimension. 

Given the optimal widths of the kernel, the entropy is estimated by 

^ Vbeb V “ J/aGa / 

In Fig. 2 we show the entropy estimation obtained for a sample of a 2D 
Gaussian variable with a diagonal covariance matrix with af = 0.36 and cr| = 
0.09, for different widths. The approximation of the maximum entropy defined 
in Equation 7 is 1.12307. From the shape of this function, it can be deduced that 
the optimal widths lay in a wide interval and consequently their choice is not so 
critical. 
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Fig. 2. Representing entropy as a function of the widths of the Parzen’s kernels. 



4 Optimal Model Selection with Maximum Entropy 

4.1 Proposed Method 

Comparing the estimations given for Equations 7 and 11, we have a way of 
quantifying the degree of Gaussianity of a given kernel. Given a set of kernels 
for the mixture (initially one kernel) we evaluate the real global entropy H{y) 
and the theoretical maximum entropy H^axiv) of the mixture by considering 
the individual pairs of entropies for each kernel, and the prior probabilities: 

K K 

HiX) - E and I^rnaxi^) — 'y ^ '^k^maxf^ (^) ■ 

fc=l k=l 

If the ratio H{y) /Hmaxiv) is above a given threshold (typically 0.95) we consider 
that all kernels are well fitted. Otherwise, we select the kernel with the lowest 
individual ratio and it is replaced by two other kernels that are conveniently 
placed. Then, a new EM starts. 

As the estimation of the entropy of a kernel requires two data sets, we select 
those whose distance to the average y,k is between the limits of a Gaussian: 
— 3-\/Af < bi < , with b = {fJ-k ~ y)- with i = l,2..d, are the 

eigenvectors associated to the kernel, and b is the projection of a data y on the 
eigenspace spanned by the eigenvectors of the covariance matrix collected in Pk . 



4.2 Introducing a New Kernel 

A low H{y) / Hmaxiy) local ratio indicates that multimodality arises and thus 
the kernel must be replaced by two other kernels. Applying PCA (Principal 
Component Analysis) to the original kernel we find that the main eigenvector 
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indicates the direction of maximum variability and we can put the two new 
kernels along the opposite senses of this direction (Fig. 3). Being k the kernel with 
low Gaussianity, after splitting it, the two new kernels fci and /c2 with parameters 
<9fci = (/ifcij ^ki) and 0k2 = ^fc2) have the following initial averages fiki = 

h-fc + and ^^2 = k-k~ with the principal eigenvalue for kernel k 

and V its associated normalized eigenvector. Furthermore, the width of the two 




Fig. 3. The direction of maximum variability is associated to the eigenvector with 
highest eigenvalue autovector Ai 



new kernel is divided by two. If A^. is the main eigenvalue in both kernels, then 
^'k~ consecuently, Sk^ = = jAffe. Finally, the new priors should also 



verify X)fc=i so we initialize them with = ^TTfe. The proposed 

algorithm is described in Fig. 4. 



Initialization: Start with a unique kernel. K = 1 . 0 i = with random values. 

Repeat: Main loop 
Repeat: E, M Steps 

Estimate log-likelihood in iteration i: ii 
Until: \£i — £i_i| < convergence_threshold 
Evaluate H{Y) and Hmax{Y) globally 
If {H(Y)/Hmax < entropy_threshold) 

Select kernel k with the lowest ratio and decompose into fci and k2 
Initialize parameters Ok^ and Ok2 

Initialize new averages: = Mfc + V^V, /rfcj = ^.k — aAIV 

Initialize new covariance matrices: Sk-^ = Sk2 = \Yk 
Set new a priori probabilities: Hki = i^k2 = 

Else 

Final = True 
Until: Final = True 



Fig. 4. Our maximum-entropy algorithm 
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4.3 Validation of the Method 

In order to test our approach we have performed several experiments with syn- 
thetic and real data. In the first one we have generated 2500 samples from 5 
bi-dimensional Gaussians with prior probabilities tt^ = 0.2 V/c. Their averages 
are: fn = [-1,-1]^, = [6, 3p, /ra = [3,6]"^, ^4 = [2,2]^, ^5 = [0,0]^ and 

their covariance matrices are 





'0.20 0.00' 




'0.60 0.15' 




'0.40 0.00' 




'0.60 0.00' 


Si — E 5 — 


0.00 0.30 


j S2 — 


0.15 0.60 


I S3 — 


0.00 0.25 


, S 4 — 


0.00 0.30 



We have used a Gaussianity threshold of 0.95, and a convergence threshold of 
0.001 for the EM algorithm. In order to evaluate the robustness of the proposed 
algorithm, several outliers were added to the data set. The sample size for es- 
timating entropy through Parzen has been 75. We have found that despite this 
small size, entropy estimation is good enough. Our algorithm converges after 30 
iterations finding correctly the number of kernels. In Fig. 5 we show the evolu- 
tion of the algorithm. We have also applied the classical EM with 5 kernels. We 




Fig. 5. Evolution of our algorithm from one initial kernel to 5 real kernels. 



have performed 20 experiments with the latter data but randomly placing the 
kernels in each one. In 18 of the 20 experiments the classical EM finds a local 
maxima. The averaged number of iterations needed was 95 (being 250 the max- 
imum and 23 the minimum). Then, only in two cases the classical EM found the 
global maxima using 21 and 31 iterations respectively. Thus, our approach ad- 
dresses two basic problems of the classical EM: the initialization and the model 
selection. 

Finally, we have applied the proposed method to the well known Iris [2] 
data set, that contains 3 classes of 50 (4-dimensional) instances referred to a 
type of iris plant: Versicolor, Virginica and Setosa. Because the problem is 4- 
dimensional, 50 samples are insufficient to construct the pdf using Parzen. In 
order to test our method, we have generated 300 training samples from the aver- 
ages and covariances of the original classes and we have checked the performance 
in a classification problem with the original 150 samples. Starting with K = 1, 
the method correctly selected K = 3. Then, a maximum a posteriori classifier 
was built, with classification performance of 98% (only three Versicolor were 
classified like Virginica). 
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5 Conclusions and Future Work 

In this paper we have presented a method for finding the optimal number of ker- 
nels in a Gaussian mixture based on maximum entropy. We start the algorithm 
with only one kernel and then we decide to split it on the basis of the entropy of 
the underlying pdf. The algorithm converges in few iterations and is suitable for 
density estimation and classification problems. We are currently validating this 
algorithm in real image classification problems and also exploring new methods 
of estimating entropy directly, bypassing the estimation of the pdf. 
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Abstract. This paper proposes the use of Gaussian Mixture Models as 
a supervised classifier for remote sensing multispectral images. The main 
advantage of this approach is provide more adequated adjust to several 
statistical distributions, including non-symmetrical statistical distribu- 
tions. We present some results of this method application over a real 
image of an area of Tapajos River in Brazil and the results are anal- 
ysed according to a reference image. We perform also a comparison with 
Maximum Likelihood classifier. The Gaussian Mixture classiher obtained 
best adjust about image data and best classihcation performance too. 



1 Introduction 

Several times, researchers in image processing justify the use of Gaussian ap- 
proach due to the data volume. In the practice of image classification, this ap- 
proach should not be applied without verify the Gaussian distribution hypothe- 
sis. It implicates in a low percentage of correct classification, as example, when 
the Maximum Likelihood classifier is utilized. To minimize this effect, several 
approachs was experimented, using: fuzzy membership functions [1], multiple 
classifiers [2], Neural Networks [3], expert systems [4], etc. 

The Gaussian Mixture Models (GMM), in theory, can be applied to modeling 
a large number of statistical distributions, including non-symmetrical distribu- 
tions. These models have been used to data classification [5] and speech and 
speaker recognition [6]. In image processing applications, researchers commonly 
use the unsupervised version of mixture models [7], [8]. 

In this paper, we proposed the utilization of ellipsoidal GMM [9] as a super- 
vised classifier for remote multispectral sensing images. We applied this method 
in a real image of an area of Tapajos River in Brazil and compare the results 
with a Maximum Likelihood classifier. 
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2 Theoretical Aspects 

In this section, we present the Gaussian Mixture Models method for image clas- 
sification. The clusters associated to the mixture components are ellipsoidal, 
centered at the means and variances matrix determine their geometric 
characteristics of the ellipses. Parameter estimation equations for classes models 
are presented first. After, the GMM method for classification is then described 
as a maximum likelihood classifier [6] . 



2.1 Gaussian Mixture Models (GMM) 

Let X = {ii, 12 , ..., Xt} be a set of T vectors, extracted from attribute space, ob- 
tained from the samples areas of the image. These information can be extracted 
at T different spectral bands of the image. Since the distribution of these vectors 
is unknown, it is approximately modeled by a mixture of Gaussian densities as 
the weighted sum of c component densities, given by the equation 

C 

p{xt\\) = t = l,...,r (1) 

i=l 

where A denotes a prototype consisting of a set of model parameters A = 
{wi,pi,Ei}, Wi, i = l,...,c are the mixture weights and N{xt, Pi, Si), are the 
T-variate Gaussian component densities with mean vectors pi and covariance 
matrices Sp 



X {xt , pi , Si) 



exp {-A {xt - Pi)' S^ ^ (xt - Pi)} 



( 2 ) 



To train the GMM, these parameters are estimated such that they best match 
the distribution of the samples from the image. The maximum likelihood esti- 
mation is widely used as a training method. For a sequence of sample vectors X 
for a A, the likelihood of the GMM is done by: 



T 

p{X\X) = J]^p(xt|A). 



(3) 



The aim of maximum likelihood estimation is to find a new parameter model 
A such that p(A|A) > p(A|A). Since the expression in (3) is a nonlinear func- 
tion of parameters in A, its direct maximisation is not possible. However, these 
parameters can be obtained iteratively using the Expectation-Maximisation 
algorithm[10]. In this algorithm, we use an auxiliary function Q done by: 

T c 

Q(A,A) = EE p{i\xt, A) log [wiN{xt, Pi, Si] 



( 4 ) 
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where p{i\xt-,X) is the a posteriori probability for each mixture component of 
image class i, i = 1, c and satisfies 



p{i\xt,X) 



WiN{xt,Pi, Sj) 

YX=i WkN{xt,Pk,XJk)' 



( 5 ) 



The Expectation-Maximisation algorithm is such that if Q{X, A) > Q{X, A) 
then p(X|A) > p(X|A)[13]. Setting derivatives of the Q function with respect to 
A to zero, we found the following reestimation formulas: 



Wi = 



1 X 

A), 



( 6 ) 



^ Tj=iP{i\xt,X)xt 

Lli — ' - - - 1 1 1 I II ( 0 

^ ^ ELi Pji\xt, A) {xt - Pi) jxt - Pi)' 

The algorithm for training the GMM is described as follows: 

1. Generate the a posteriori probability p{i\xt, A) at random satisfying (5); 

2. Gompute the mixture weight, the mean vector, and the covariance matrix 
following (6), (7) and (8); 

3. Update the a posteriori probability p(z|a::t. A) according to (5) and compute 
the Q function using (4); 

4. Stop if the increase in the value of the Q function at the current iteration 
relative to the value of the Q function at the previous iteration is below a chosen 
threshold, otherwise go to step 2. 



The GMM classification. To provide GMM classification, we need several 
classes of image A. So, let Xk, k = 1, denote models of N possible classes 
of image. Given a feature vector sequence X, a classifier is designed to classify 
X into N classes of image by using N discriminant functions gk{X), computing 
the similarities between the unknown X and each class of image Xk and selecting 
the class of image Xk* if 

k* = arg max gk{X) (9) 



In the minimum-error-rate classifier, the discriminant function is the a pos- 
teriori probability: 

gk{X)=p{Xk\X). (10) 

We can use the Bayes’ rule 



p{Xk\X) 



p{Xk)p{X\Xk) 

p{X) 



( 11 ) 
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and we can assume equal likelihood of all performances, i.e., p{Xk) = 1/-/V. Since 
p{X) is the same for all performance models, the discriminant function in (10) 
is equivalent to the following[14]: 

gk{X) = p{X\Xk) (12) 

Finally, using the log- likelihood, the decision rule used for class of perfor- 
mance identification is: 

Select performance model k* if 



T 

k* = arg max logp(xtlAfc) (13) 

“ “ t=i 

where p{xt\Xk) is given by (1) for each k, k = 1, 

2.2 The Bayesian Information Criterion 

Several information criteria are presented in literature as Bayesian Information 
Criterion (BIC) [9] and Akaike Information Criterion (AIC) [11]. However, Bier- 
nacki and Govaert [12] concluded that BIC has a better performance that others 
to choose a good mixture model after the comparison of some information cri- 
teria. 

The Bayesian Information Criterion (BIC) was proposed by Schwartz [9] to 
evaluate the quality of estimations. This criterion compares models with different 
parameters or different number of components. Formaly, let a data set X and a 
model TO, p{X\Xm), to = 1, 2, ... with c components, c = 1,2, ..., the BIC for this 
model is done by: 



BIC{m, c) = -2log[p{X\X„i)] + Vrn,Jri{T) (14) 

where p{X\Xm) is done by (3), Um,c is the number of independent parameters to 
be estimed in the model to with c components and T is the number of components 
in vector X. 

The best model to represent the data set X is the model to* that 

m* = aigmax BIC (m,c) (15) 



3 A Case Study 

To test the classifier proposed in Sect. 2.2, we performed a case study with a 
real image of Brazilian state of Para at Tapajos River. The image was obtained 
from the number 4 spectral sensor of Landsat satellite. The image was composed 
by three classes: Tapajos River (class 1), Contact Areas (class 2) and Human 
Occupation Areas (class 3). 

To evaluate the quality of estimations, we use Bayesian Information Criterion 
(BIC) [9], which compares models with different parameters or different number 
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of components. This criteria points to the best model. So, according to the BIC, 
the classes 1 and 3 were modeled using two components {K = 2), because those 
original data distributions were not Gaussian distributions, and the class 2 was 
modeled using one component. The Fig. 1 shows all the models. 






Fig. 1. Gaussian Mixture Models for classes 1, 2 and 3. Class 1 (a) was modeled using 
K = 2, class 2 (b) was modeled using K = 1 and Class 3 (c) was modeled using K — 2. 



A statistical comparison between classification and a reference map of 
the same area was performed using Kappa coefficient. The Kappa result was 
90.6793% with variance 9.5815 x 10“^. The same coefficient for each class results 
were: 99.6929% for class 1, 84.5049% for class 2 and 86.7262% for class 3. The 
percentuals of correct classification for each class were respectively: 99,8134%, 
89.3032% and 90.6801%. The main misclassification occurred with classes 2 and 
3. In fact, there is an overlapping of those classes distributions. The Table 1 
shows the matrix classification used for calculations we presented above. 

We can to note, in Table 1, that only one pixel of class 1 was classified as 
class 2, 46 pixies of class 2 were classified as class 3 and 37 pixels of class 3 were 
classified as class 2. The cause of these facts is the overlapping of classes 2 and 
3 distributions. 

Thus, the classification can be considered satisfactory in statistical terms, 
even if we consider the overlapping of class 2 and class 3 distributions. It is im- 
portant to note that these results were obtained using only one original spectral 
band. 
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Table 1. Matrix classification for GMM classifier using a reference map. 



Classes 


class 1 


class 2 


class 3 


class 1 


535 


1 


0 


class 2 


0 


384 


46 


class 3 


0 


37 


360 






Fig. 2. Maximum Likelihood Models for classes 1 (a), 2 (b) and 3 (c). All classes were 
modeled as a Gaussian distribution. 



4 Comparison with a Maximum Likelihood Classifier 

To compare performances between GMM classifier with another classifier, we 
performed a classification using a Maximum Likelihood classifier. Using the same 
reference map, the statistical Kappa coefficient for Maximum Likelihood classi- 
fication was 89.2355 with variance 1.09389 x 10“^. The same coefficient for each 
class results were: 99.0809% for class 1, 84.2878% for class 2 and 82.9756% for 
class 3. The percentuals of correct classification for each class were respectively: 
99,4403%, 89.3032% and 87.9093%. Again, the overlapping of 2 and 3 classes 
distributions was responsable for misclassification. The Table 2 shows the matrix 
classification used for calculations we presented above. 

We can to note, in Table 2, that 3 pixels of class 1 was classified as class 2, 46 
pixies of class 2 were classified as class 3 and 48 pixels of class 3 were classified as 
class 2. Besides the overlapping of 2 and 3 classes distributions, we can observe 
that Maximum Likelihood has obtained satisfactory results too. However, it is 
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Table 2. Matrix classification for Maximum Likelihood classifier using a reference map. 



Classes 


class 1 


class 2 


class 3 


class 1 


533 


3 


0 


class 2 


0 


384 


46 


class 3 


0 


48 


349 



important to note that all results are not better than GMM classifier. The Fig. 2 
shows all the models. 

5 Conclusions and Future Works 

In this paper, we proposed the use of Gaussian Mixture Models for supervised 
image classification. We obtained results of a real data classification using an 
area of Tapajos River in Brazil. We did a comparison of this classifier with the 
Maximum Likelihood classifier. 

The Gaussian Mixture Models classifier is adequated to classify images which 
classes can not be adjusted by Gaussian distributions. These models provide 
more adequated adjust to several distributions with lower variance. This prop- 
erty in particular is welcome in image classification for reduce misclassification, 
but these properties are already used in others applications, as data analysis and 
speech and speaker recognition. 

We presented an application using only one original spectral image. In this 
application we observed better statistical results of GMM classifier. It is impor- 
tant to note that classes 1 and 3 do not have Gaussian distribution as showed 
in the Fig. 1 and Fig. 2. This fact is relevant to explain the inferior performance 
of Maximum Likelihood classifier, mainly to classify the class 3. 

As future works we intend to do a performance statistical comparison with 
others statistical classifiers. 
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Abstract. In this paper we present a way to reduce the computational cost of k- 
NN classifiers without losing classification power. Flierarchical or multistage 
classifiers have been built with this purpose. These classifiers are designed put- 
ting incrementally trained classifiers into a hierarchy and using rejection tech- 
niques in all the levels of the hierarchy apart from the last. Results ai'e presented 
for different benchmark data sets: some standard data sets taken from the UCI 
Repository and the Statlog Project, and NIST Special Databases (digits and up- 
per-case and lower-case letters). In all the cases a computational cost reduction 
is obtained maintaining the recognition rate of the best individual classifier ob- 
tained. 



1 Introduction 

Pattern classification based on the A:-nearest-neighbors (^-NN) [1], [2] approach has 
been proven to perform well in pattern classification on many domains. It is well 
known that the main drawback of this kind of classifier in practice is the computa- 
tional cost demand. In order to find the k nearest patterns, a dissimilarity measure 
between the test sample and a large number of samples in the training set is computed 
making this approach very computational. One method for solving this problem is to 
reduce the large data set to a small and representative one using condensing algo- 
rithms [4]. The objective of selecting a subset of the original training data is the com- 
putational efficiency in the classification phase, or/and making the resulting classifi- 
cation and generalization more reliable. Other kinds of methods try to reduce the data 
set by generating artificial prototypes summarizing representative characteristics of 
similar instances [10]. In many works has been proven that hierarchical systems are 
good too to reduce the computational cost in classifiers with a large data set. Besides, 
in recent years there is an increasing interest in schemes that combine multiple classi- 
fiers [5]. 

In this paper, multistage classifiers based on ^-NN classifiers are proposed for the 
reduction of the computational cost. The multistage classifiers are built with I levels 
of individual k-NN classifiers trained incrementally and using rejection techniques in 
the levels. Each of the individual classifiers is built with different characteristics and 
the computational cost of classifying a pattern is proportional to the level where the 
pattern has been recognized. When a pattern has not enough classification certainty in 
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a level of the hierarchy goes to the next level where a more complex classifier will try 
to classify it. A lot of the patterns are classified in the lowest levels of the hierarchy, 
where the computational cost is low because the simplest classifiers have been put 
there. 

We have used different benchmark data sets to validate our system. Three data sets 
have been selected from the UCI Repository and the Statlog project [6]. The results 
obtained with these data sets are put together with previously obtained results with 
characters of the NIST Special Databases [12]: digits, upper-case and lower-case let- 
ters. 

The paper has the following structure. In Section 2 a formal description of the 
multistage system is presented. In Section 3 we describe the particular classifiers used 
and the experimental results obtained. Finally in section 4 the conclusions are pre- 
sented. 



2 Formal Description of the System 

We are going to give now some definitions and the notation used throughout the pa- 
per. After that the algorithms used in the training and recognition process to build the 
multistage system are presented. 

A pattern is represented by x. and is identified by the orderly pair (r.,cj, where 
r.=(r.j, r.J are the d feature values that represent the pattern x. and c. is the class 
label assigned to the pattern. 

The Training Set formed for all the training patterns, TS, is defined by 
TS = {x; I X; = (fpC;)}and the Reduced Set, RS, is a subset of TS, obtained with a 
learning or training algorithm. 

The following operators will appear in next sections: 

• C(xJ = c, is the class of the pattern x.. 

• d(x., Xj) = \r.- r\ is the Euclidean distance between two feature vectors (r, r) 
associated to the patterns x. and xxespectively. 

• D(x) = min{d(x,xj))Vxje RS A.C(xj) = C(x)a.re the nearest patterns to x be- 
longing to the same class. 

• — iD(x) = mmi^{x,Xj))'^Xj e RS a C{xj) ^ C(x) are the nearest patterns 
to x belonging to a different class. — iD(x) zz oo |C(xp = C(x)Vx. G RS 

• NNi^(x,S) = (xj,X2,...,x^. )| x , g S l<i<kis the distance-ordered set of the 
k-nearest patterns to x in the set S. 



2.1 Classic Training Algorithm {TA} 

The training algorithm shown in Fig.l generates a reduced data set RS taking the 
training set TS. Two rules, specified as (1) and (2) in the figure, have been applied to 
obtain the reduced set. The reduction obtained depends on the parameter tt {training 
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threshold). The threshold can take values between 0 and infinite. If the threshold takes 
value 0 the maximum reduction is obtained and this algorithm becomes into Hart’s 
algorithm [4], If the value assigned to this parameter is infinite, there is no reduction 
and RS and TS are the same data set. 



Initializations 
RS = <t> 

training = true 
whileitraining ) 
training = false 
Vx; e TS 

il)-if{C(x,)^C{NN,(x,,RS))) 
RS = RS U Xj 
training = true 

else 



( 

(2) : if 



V 



end V 
end while 



,D(x,)-Z)(x,)<^; 

D{Xi ) 

RS = RS U X; 
training = true 



Initializations 

IRS I = IRSj^i with IRSq = (p 
training = true 
whileitraining^ 
training = false 
Vx^- e ITS, 

(l)-if{cixj)^c{NN,(Xj,IRS,))) 
IRS I = IRSj U Xj 
training = true 



(2) -if 



end V 
end while 



else 

nDixj)-D{xj) 



Dixj) 



<tt: 



IRS I = IRS I uxj 
training = true 



Fig. 1. Training Algorithm {TA) 



Fig. 2. Incremental Training Algorithm (ITA) 



2.2 Incremental Training Algorithm {ITA) 

The different classifiers used to build a hierarchy have been trained using an incre- 
mental training algorithm, which is a modification of the algorithm shown in previous 
section. The training set TS is divided in s subsets: 

^ , ( 1 ) 

TS = u Tsj Tsj n 71s ■ Vi, y | i A j. 

!=1 ■' 

The incremental training set i, ITS., is defined as following: 

'■ , ( 2 ) 

ITSf = uTs:\ ITSf = ITSi_i U TSf with ITSq =(j). 

The cardinal of ITS^ follows a exponential and uniform distribution. 

The /ncremental Training Algorithm {ITA), creates an incremental reduced set, 
IRS., taking the incremental training set, ITS^, and the incremental reduced set i-1, 
IRS. j, by applying the same two rules shown previously in Fig. 1. The incremental al- 
gorithm is shown in Fig. 2. 

Having used this algorithm, we can use the following property in the recognition 
process: 



Vi > 0 IRSi_i c IRSi with IRS^ = 0 . 



(3) 
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The incremental training algorithm reduces the computational cost of the training 
process because the training starts from an already built classifier and only some new 
patterns are learnt to create the new classifier. 

2.3 Multistage Recognition Algorithm (MRA) 

The multistage classification process is obtained using classifiers trained in an incre- 
mental way and putting them into a hierarchy ordered according to the number of 
patterns they have. Having a pattern x, we will define a confidence value, , used 
to decide if the pattern x has enough classification certainty to be classified in the 
level i of the hierarchy (this rule is known as ^./-Nearest Neighbor [1]). The expres- 
sion can be seen in Equation 4. 



k. is the number of nearest neighbors taken in level i and determines how 

many of these patterns belong to class c. The expression is shown in Equation 5. 



The multistage recognition algorithm is shown in Fig. 3. This recurrent algorithm 
starts with the call MRA(x,l) and represents the classification process of a pattern x of 
the test set. This algorithm reduces the computational cost in the classification proc- 
ess, because the discriminating capacity of the hierarchical classifier is adapted to the 
recognition difficulty of each pattern. 



MRA{x,i) 

if (i = max_ level) return WR^ ^ (x); 
else 

if (CVjj. (x) > rti ) return c 
^ else return (MRA{x,i + l)y, 

max_level is the number of levels 

rtj is the rejection threshold associated to the level i. 



Fig. 3. Multistage Recognition Algorithm 



When the pattern reaches to the last level of the hierarchy we use the Weighted 
rule WR^fx) [1] modified (Equation 6) to classify it. 




(4) 




(5) 



WRrk(x)=max{D^j^{x)) 



'= maxi 



C 




(6) 
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2.4 Multistage Recognition Algorithm with Active Memory (MRAMA) 

In order to reduce the computational cost of the previous algorithm, a modification 
has been introduced. This new algorithm keeps memory about what happened in the 
previous levels [3] of the hierarchy and uses it when a pattern is going to be classified 
in a later level. This is a memory-based method or a method with “active memory”. 
Particularly, the information known about the previous levels is the set of classes 
found among the k nearest patterns of that level, referred as CS,^,. The expression for 
this set is in Equation 7. 

CSi,k. (x)= ^{xj )\xje (jc, IRS ^ ) a C(xj ) g C5,_i J Vi > 0 (7) 

where CSqj^^ contains all the possible classes. 

The description of the new algorithm is shown in Fig. 4. The algorithm starts with 
the call MRAMA(x,l,CS„J. The only difference between the expressions MAWR.Jx) 
and MACV.^. (x) and the presented in previous section WR.^(x) and (x) is that in 
the new expressions the k. patterns are searched only among the patterns of the classes 
in the set which reduces the cost. 



\MRAMA{x,i,CSi_i^ ) 

{ 

if (i = max_/evei) return MAWRj ^ (v); 
else 

if (MACVi (x) > rti) return c 
else return (MRAMA(x, i + 1, C5,- )); 

max_level is the number of levels 

rti is the rejection threshold associated to the level i. 



Fig. 4. Multistage Recognition Algorithm with Active Memory 



The computational cost associated to recognize a pattern x in the level i using this 
algorithm and incremental training is defined with the following expression: 



RCCi (x) = RCCi_i (x) -I- I Xj G {IRS I - IRS^^i } a C(Xj)e (x)| 

yi>lARCCi{x) = \lRSi\. 



The average computational cost of the hierarchical or multistage classifier with L 
levels taking a test pattern set X is defined in Equation 9 



Where 

j 



RCC(X) 



/ 1^1 

^ I J=l ' " 



iXj) 



is the level where the pattern Xj has been recognized . 



(9) 
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3 Experimental Results 

The experimentation has been carried out with different data sets: Statlog LandSat 
Satellite (Satimage), Statlog Shuttle (Shuttle) and UCI Letter Recognition database 
(Letter), taken from the UCI Repository and the Statlog project. Table 1 presents the 
characteristics of these benchmark data sets. 



Table 1. Benchmark data sets used 





TS 


Test Set 


Classes 


Features 


Shuttle 


43,500 


14,500 


7 


9 


Letters 


15,600 


4,400 


26 


16 


Satimage 


4,435 


2,000 


7 


36 



The first experimentation has been carried out for individual classifiers using the 
training algorithm TA and the WR.^(x) rule. Different training thresholds and TS sizes 
have been proven in order to find the classifier with best recognition rate. For Sati- 
mage data sets have been defined 3 TS of 1,200, 2,400 and 4435 patterns respectively. 
For Shuttle data sets 5 TS have been used with 2,100, 4,200, 8,400, 16,800 and 43,500 
patterns. Finally for Letter data set 4 TS have been defined with 2,600, 5,200, 10,400 
and 15,600 patterns. For all the data set a bigger TS contains the patterns of the 
smaller ones. We can see in Table 2 the best results obtained for these three data sets 
and the characteristics of the classifiers that provide these results. The recognition rate 
achieved is similar to the presented by other authors [7], [11]. Together with the re- 
sults of these three benchmarks, the results previously obtained with characters of 
NIST Special Databases, digits [8] and letters [9] are presented. We can see that the 
best results for all the data sets are obtained with a reduced set of the bigger TS used. 
Selecting some patterns a computational cost reduction is obtained. 



Table 2. Results obtained with individual classifiers 





TS 


RS 


Recognition 


tt 


[7] 


[11] 


NIST, digits 


160,000 


23,493 


99.50 


0.5% 






NIST, upper-case 


60,194 


24,348 


95.44 


0.5% 






NIST, lower-case 


49,842 


24,503 


88.34 


0.5% 






Shuttle 


43,500 


567 


99.88 


0.75% 


99.86 


99.94 


Letters 


15,600 


8,406 


95.73 


0.75% 


96.6 


68.30 


Satimage 


4,435 


1,924 


91.15 


0.25% 


90.15 


86.25 



The best recognition rate presented can be obtained with less computational cost if 
we use the multistage classifiers with incremental training and active memory. We 
have built hierarchies with different number of levels for each data set depending on 
the number of training subsets we have. The TS subsets created for each data set have 
been trained incrementally now, using the previous trained subset as described in the 
section 2.2 and put into a hierarchy ordered by the number of patterns they have. The 
tt used to obtain IRS has been 0 except for the last IRS. The last TS (the one with more 
patterns) has been trained using the value 0 for the tt, and using the tt which provides 
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the best recognition rate. The incremental training helps to reduce the computational 
cost in the recognition process because in each level of the hierarchy only the new 
patterns of the classifier need to be check. 

These classifiers have been put into a hierarchy and applying the algorithm pre- 
sented in section 2.4 and important computational cost reduction has been obtained 
maintaining the hit rate presented before (The computational cost is calculated with 
the expression in Equation 8, and compared with the computational cost of the indi- 
vidual classifier in the last level, which gives the same recognition rate). These results 
are shown in Table 3. In this case too, results obtained with NIST Databases are pre- 
sented in order to make a comparison. 



Table 3. Results obtained with multistage and incremental classifiers 





Levels 


Recognition 


Computational cost / Speed-up 


NIST digits 


12 


99.48 


957.69 / 167 


NIST upper-case 


12 


95.44 


744.23 / 80.88 


NIST lower-case 


12 


88.12 


1145/43.53 


Shuttle 


6 


99.88 


329/ 132.2 


Letters 


5 


95.54 


4070/3.82 


Satimage 


2 


91.5 


1093 / 4.05 



We can observe that although there is always a computational cost reduction 
(speed-up) applying the hierarchical recognition algorithm, when the data set contains 
few patterns the reduction is not very high. In problems with large TS or in easy 
problems, the computational cost reduction is higher. 

Besides, for the Satimage data set we have selected a classifier of two levels al- 
though we primarily took a classifier with 4 levels as described before. The first two 
classifiers in this case had very few patterns so they rejected all the patterns and se- 
lected a class set not correct that increment the error rate in the next levels. For this 
reason these first classifiers have been remove from the hierarchy, building a hierar- 
chy of only 2 levels. 



4 Conclusions 

It is well known that k-NN classifier provides good recognition results on pattern rec- 
ognition systems of different domains. In this paper we have presented a multistage 
incremental algorithm that reduces the computational cost associated to k-NN classi- 
fiers, solving one of the most important problems of these algorithms. 

The multistage classifier is built with different k-NN classifiers (with different 
number of patterns used in the classification) trained in an incremental way. The dis- 
criminating capacity of the classifiers is adapted to the classification needs of each 
pattern in the classification process; the easiest patterns are classified in the first levels 
of the hierarchy, with low computational cost. Only the most difficult patterns reach 
the last level of the hierarchy where the classifiers with more patterns have been put. 

The experimental results have been presented for different benchmark data set. We 
have observed that although always a computational cost reduction is obtained main- 
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taining the recognition rate of the best classifiers, the speed-up is higher when large 
training sets are taken. The speed-up obtained with data sets as NIST digits databases 
where we have taken 160,000 patterns for training is of 167 times whereas the reduc- 
tion for data sets as Satimage with 4435 training patterns is of 4.05 times. In all the 
cases a good recognition rate is maintained. 
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Abstract. Nearest neighbour search is one of the most simple and used 
technique in Pattern Recognition. In this paper we are interested on tree 
based algorithms that only make use of the metric properties of the space. 
One of the most known and refereed method in this class was proposed 
by Fukunaga and Narendra in the 70’s. 

This algorithm uses a tree that is traversed on search time and uses some 
elimination rules to avoid the full exploration of the tree. 

This paper proposes two main contributions: two new ways for con- 
structing the tree and two new elimination rules. As shown in the 
experiment section, both techniques reduce signihcantly the number of 
distance computations. 

Keywords: Nearest Neighbour, Metric Spaces, Elimination rule. Pat- 
tern Recognition. 



1 Introduction 

Nearest neighbour search (NNS) is a simple technique very popular in problems 
related with classification. The NNS method consists on finding the nearest point 
from a prototype set to a given sample point using a distance function [3] . 

To avoid the exhaustive search many algorithms have been developed in 
the last thirty years [1]. One of the most popular and refereed algorithm was 
proposed by Fukunaga and Narendra [4]. 

This algorithm builds, on preprocess time, a tree that is traversed in search 
time using some elimination rules to avoid the exploration of some branches. 

The algorithm does not make any assumption on the way the points are 
coded. It can be used in any metric space, that is, the distance function has to 
fulfil the following conditions: 

— d{x, y) > 0 (= 0 if a: = y). 

— d{x,y) = d{y,x) (symmetry). 

— d{x,z) < d{x,y) + d{y,z) (triangle inequality). 

* The authors thank the Spanish CICyT for partial support of this work through 
project TIC2000-1703-C03-02. 
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Although some recently proposed algorithms are more efficient, the Fuku- 
naga and Narendra algorithm is a basic reference in the literature and in the 
development of new rules to improve the main steps of the algorithm that can 
be easily extended to other tree based algorithms [9] [10] [2] [6] [7]. 

In this paper we propose two new ways of building the tree and two new 
elimination rules. 

2 The Fukunaga and Narendra Algorithm 

The Fukunaga and Narendra algorithm is a fast search method that use a hier- 
archical clustering to build a search tree where all the prototypes are stored. In 
particular, it uses a divisive strategy splitting the training data into I subsets. 
Moreover each subset is divided into I subsets again, and applying recursively 
this procedure a search tree is built. Fukunaga and Narendra proposed to use 
the c- means at each step. Each node p of the tree represents a group of samples, 
and is characterised by the following parameters: 

— Sp set of samples 

^ Np number of samples 

— Mp representative of Sp 

— Rp = maxa;.g5p d{xi, Mp), (the radius of the node) 

When an unknown sample x is given, the nearest neighbour is found by 
searching in the tree by first-depth strategy. Among the nodes at the same level, 
the node with a smaller distance d{x, Mp) is searched earlier. Let n be the current 
nearest neighbour to x among the prototypes considered up to the moment, the 
following two rules permit to avoid the search in the subtree p\ 

rule for internal nodes: no y G Sp can be the nearest neighbour to x if 

d{x, n) + Rp < d{x, Mp) 

rule for leaf nodes: y G Sp cannot be the nearest neighbour to x if 

d{x, n) + d{y, Mp) < d{x, Mp) 

In this work only binary trees with one point on the leafs^ are considered. 
On such case the second rule becomes a special case of the first one. This rule 
will be refereed as the Fukunaga and Narendra’s rule (FNR). 

3 The Searching Tree 

Two approximations have been developed as alternative to the use of the well 
known c-means algorithm that was recursively used by Fukunaga in the con- 
struction of the tree structure. 

The first clustering strategy is called Most Separated Points (MSP), and 
consists on: 



1 



In the Fukunaga and Narendra algorithm, leaf nodes can have more than one point. 
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Fig. 1. Original elimination rule used in the algorithm of Fukunaga and Narendra 
(FNR). 

— use as representative of the two children of each node the two more separated 
prototypes, 

— classify the rest of the prototypes in the node of the nearest representative, 

— recursively repeat the process until each final node has only a prototype, the 
representative. 

The second clustering strategy is called Most Separated Father Point 
(MSFP)^, and consists on: 

— randomly select a prototype as the representative of the root node, 

— in the following level, use as representative of one of the nodes the repre- 
sentative of the father node. The representative of the sibling node is the 
farthest prototype among all the prototypes belonging to the father node. 

— classify the rest of the prototypes in the node of the nearest representative, 

— recursively repeat the process until each leaf node has only one point, the 
representative. 

Of course, the second strategy is not as symmetric as the first one and it 
will produce deeper trees. On the other hand, this strategy permits to avoid the 
computation of some distances in the search procedure as one of the represen- 
tatives is the same than the father, each time that it is necessary to expand a 
node, only one new distance computation is needed. 

4 The New Rules 

The elimination rules defined by Fukunaga and Narendra only make use of the 
information between the node to be prune and the hiperespherical surface cen- 

note that this tree can be built in 0(nlog(n)), where n is the prototype set size 
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tred in the test sample with radius the distance to the nearest point considered 
up to the moment. 

In the proposed new rules, to eliminate a node /, also information related 
with the sibling node r is used. 



4.1 The Sibling Based Rule (SBR) 

A first proposal requires that each node r stores the distance between the rep- 
resentative of the node, M^, and the nearest point, e^, in St,. 




Fig. 2. Sibling based rule (SBR). 



Definition of SBR: given a node r, a test sample x, an actual candidate to 
be the nearest neighbour n, and the nearest point to the representative of the 
sibling node f, e^, the node i can be prune if the following condition is fulfil: 

d{Mr,et) > d{Mr, x) + d{x, n) 

It is interesting to see that this rule don’t need to know the distance between 
X and Ml. That will permit to avoid some distance computations in the search 
procedure^ . 



4.2 Generalised Rule (GR) 

This rule is an iterated combination of the FNR and the SBR. Let I be a node, 
to apply this rule, first a set of points {^i} is defined in the following way: 

Gi = Si 

£i = a.rgma,yip^Q.d{p, Ml) 

® In the search procedure, each time a node expansion is needed, the distances between 
each representative of the children to the test sample is calculated. After that, the 
elimination rules are applied. 

Now, as the new rule SBR don’t use d(x,Mi), I can be eliminated before compu- 
tation of d(x, Me). 
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Gi+i = {peGi-. d{p, Mr) < d{ii, Mr)} 



In preprocessing time, the distances d{Mr,ii) are stored in each node £. 




Fig. 3. Generalised rule (GR). 



Definition of GR: given two sibling nodes I and r, a test sample x, an actual 
candidate to be the nearest neighbour n, and the list of point £i,£ 2 , ■ ■ ■ ,is, the 
node £ can be prune if there is an integer i such that: 

d{Mr,£i) > d{Mr,x) + d{x,n) (1) 

d{Mi,£i+i) < d{Mi,x) - d{x,n) (2) 

Cases i = 0 and i = s are also included not considering equations (1) or (2) 
respectively. Note that condition (1) is equivalent to SBR rule when i = s and 
conditioni (2) is equivalent to FNR rule when i + 1 = 1. 

5 Experiments 

Some experiments with synthetic and real data were carried out to study the 
behaviour of the algorithm. 

The prototypes in the synthetic experiment set were extracted from a 6- 
dimensional uniform distribution in the unit hypercube. The Euclidean distance 
was used. 

All the experiments were repeated with 10 different sets of prototypes, 1000 
samples were used as test. 

A first set of experiments were carried out in order to study the behaviour of 
the algorithm when using c-means (the proposed by Fukunaga and Narendra), 
MSP and MSFP clustering algorithm to build the tree (fig. 4) . 
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The experiments show that c-means and MSP have a very similar behaviour^ 
and that MSFP is neatly superior. This is because the saving of one distance 
computation at each level compensates the fact that the trees are deepest. 

A second set of synthetic experiments were carried out to show the behaviour 
of the algorithm when using the elimination rules FNR, FNR+SBR and GR 
using the MSFP clustering algorithm (see fig. 5). 




Number of prototypes 

Fig. 4. Influence on the Fukunaga and Narendra algorithm when c-means, MSP and 
MSFP clustering algorithms are used in the tree building process with a 6-dimensional 
uniform distribution. 



As was expected the addition of the SBR reduces slightly the number of 
distance computations but the GR® reduces it drastically. 

Some experiments using real data have also been made. In particular, 
PHONEME database from the ROARS ESPRIT project [8] was used. The 
PHONEME database consists of 5404 5-dimension vectors from 2 classes. The 
set was divided in 5 subsets, using 4 sets as prototypes and 1 set as samples. A 
leaving one out technique was used. 

The results plotted in figure 6 show the average number of distance com- 
putations as the size of the training set increases. All the combinations of the 
tree clustering algorithms and FNR and GR are shown. The behaviour using 
this data seems similar to the obtained results with artificial data, as figure 
illustrates. 



^ note that MSP is much faster than c-means 

note that the FNR and the SBR are special cases of the GR 
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Fig. 5. Influence on the Fukunaga and Narendra algorithm when MSPF is used to 
build the tree and the rules FNR, FNR+SPR and GR are used in the search. 
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Fig. 6. Average number of distance computations by sample in relation to the size of 
the training set for the PHONEME database. 



6 Conclusions 



In this paper we have developed a series of improvements based on the algorithm 
proposed by Fukunaga and Narendra. These algorithm builds a tree in preprocess 
time to speed the nearest neighbour search. 

On the one hand, two new methods to build the tree has been proposed. 
This tree is quicker to build and allows the search algorithm to find the nearest 
neighbour with less distance computations. 
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On the other hand, two new elimination rules are proposed to speed up the 
nearest neighbour search. The experiments suggest that high speed ups can be 
obtained. 

In the future, we plan to apply these approximations to other nearest neigh- 
bour search algorithms based on a tree structure. We are also interested in testing 
these techniques in general metric spaces. 
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Abstract. One of the features involved in clustering is the evaluation 
of distances between individuals. This paper is related with the use of 
mixed metrics for clustering messy data. Indeed, when facing complex 
real domains it becomes natural to deal simultaneously with numerical 
and symbolic attributes. This can be treated on different approaches. 
Here, the use of mixed metrics is followed. 

In the paper, a family of mixed metrics introduced by Gibert is used 
with different parameters on an experimental data set, in order to assess 
the impact on final classes. 

Keywords: clustering, metrics, qualitative and quantitative variables, 
messy data, ill-structured domains . . . 



1 Introduction 

Clustering is one of the more used technique to separate data into groups. In 
fact, we agree with the idea that a number of real applications in KDD either 
require a clustering process or can be reduced to it [18]. Also, in apprehending 
the world, men constantly employ three methods of organization, which pervade 
all of their thinking: (i) the differentiation of experience into particular objects 
and their attributes; (ii) the distinction between whole objects and its parts and 
(Hi) the formation and distinction of different classes of objects. That’s why, 
several well known expert systems (MYCIN [23], . . . ) are actually classifiers. 

However, when facing ill- structured domains as mental disorders, sea sponges, 
disabilities. . .clustering has to be done on heterogeneous data matrices. In this 
kind of domains (see [5] , [6] ) , the consensus among experts is weak — and some- 
times non-existent; when describing objects, quantitative and qualitative infor- 
mation coexists in what we call non-homo geneous data bases. Even more, the 
number of modalities of qualitative variables depends on the expertise of who is 
describing the objects: the more he knows about the domain, the greater is the 
number of modalities he uses. 

In this work, mixed metrics introduced by Gibert in [4], [10] for measuring 
distances with messy data is used. This measure has been successfully imple- 
mented in a clustering system called Klass [5], [6] and applied to very different 
ill-structured domains [7], [9], [11], [12]. 

* This research has been partially financed by the project CICYT’2000. 
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Main goal of this paper is to study the behavior of different metrics of Gibert’s 
family (which also includes Ralambondrainy proposals as particular cases) on a 
set of experimental data that presents different structures, in order to study 
which parameters of Gibert’s metrics perform better in clustering, according 
to the data structure. Formal approach to this problem requires a too complex 
theoretical development. That’s why an experimental approach is presented here 
as a first step of the research. A similar experiment was also performed by Diday 
in [2], comparing the performance of two metrics [14] and [13] in clustering. Next 
step of this work is to make a global comparison. 

This paper is organized as follows: after the general introduction, an overview 
of the possibilities of working with messy data is presented. Then, details on the 
indexed family of distances that combines qualitative and quantitative infor- 
mation introduced by Gibert is presented in 3, together with several proposals 
on the indexes values. In section 4 the experiment context is provided. Section 
4.1 introduces the experimental data sets, while section 5 presents main results. 
Finally, the last section presents some conclusions and future work. 



2 Clustering Heterogeneous Data Matrices 

Management of non-homogeneous data matrices requires, indeed, special atten- 
tion when classifying ill-structured domains. Standard clustering methods were 
originally conceived to deal with quantitative variables. Upon [1], data analysis 
with heterogeneous data bases may follow three main strategies: 

Variables partitioning. It consists on partitioning the variables upon their type, 
then reducing the analysis to the dominant type (determined owing to the group 
with a greater number of variables, or the group containing the more relevant 
variables, or the background knowledge on the domain. . . ). 

For example, if dominant type is qualitative variables, then correspondence 
analysis could be used and later a clustering on the factorial components is possi- 
ble [24], [17]. Since the classification is performed in a fictitious space, additional 
tools are required to enable the interpretation of the results. 

This approach of course misses the information provided by the non dominant 
groups of variables. A natural extension is to perform independent analysis inside 
every type of variables. Problem, then, is later integration of results of parallel 
analysis to produce a consistent, coherent and unique final result. Even in this 
case, interactions between variables of different types (like cooking temperature 
and final color of a ceramic) cannot be analyzed under this approach. 

Variables converting. It consists on converting all the variables to a unique type, 
trying to conserve as much original information as possible. First of all, final con- 
verting type has to be decided. Gonversion is not a trivial process (every variable 
may be converted to a unique one, or split to a group of variables or several 
original variables will be grouped to a unique transformed one). In Statistics, 
traditionally, symbolic variables has been converted to a set of binary variables. 
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to generate the complete incidence table. Then, clustering using metrics may 
be performed [3]. Dimensions of the complete incidence table implies a signifi- 
cant cost increase. In Artificial Intelligence, grouping of quantitative values into 
a qualitative one [22] is much more popular. This transformation implies a rel- 
evant loss of information as well as the introduction of some instability in the 
results, which depend on the defined grouping. 

Many authors, among them [1], [16], discuss different strategies on this line, 
together with the associated problems of loose of relevant information or even 
making difficult final interpretation, since the transformed variables could be in 
a fictitious space. Also, in [5] it is shown how converting all the variables to 
qualitative ones introduces, almost always, a bias on the results, which can be 
sometimes even arbitrary. 



Compatibility measures. It consists on the use of compatible measures which 
cover any combination of variable types, making an homogeneous treatment 
of all the variables. It can, for instance, be defined a non-senseless distance (or 
similarity) between individuals which uses different expressions for every variable 
type. 

The idea is to allow clustering on a domain simultaneously described by quali- 
tative and quantitative variables without transforming the variables themselves. 
Since in the core of the classification process distances between individuals have 
to be calculated, a function to do it with non homogeneous data has to be found. 
In the literature several proposals on this line can be found. Upon discussions on 
[4] and [5], this is the approach of this work. Main advantages of this approach 
are: respecting the original nature of data, there is not loss of information, it is 
not necessary to take previous arbitrary decisions which can bias results, it is 
possible to study all the variables together, it is possible to analyze interactions 
between variables of different types. Proposals on this line could be, chronologi- 
cally: Gower 71 [14], Gowda & Diday 91 [13], Gibert 91 [4,8], Ichino & Yaguchi 
94 [15], Ralambondrainy 95 [21], Ruiz-Schulcloper [19]. 



3 Gibert’s Mixed Metrics 

The input of a clustering algorithm is a data matrix with the values of K variables 
Xi . . . Xk observed over a set I = {1, ... n} of individuals. Variables are repre- 
sented in columns while individuals in the rows of data matrix. The cells contain 
the value, Xik, taken by individual i £l for variable Xk, {k = 1 : K). In our con- 
text, heterogeneous data matrices are supposed, so let us name C S {1 ... AT} the 
indexes of numerical variables and Q = {1 ... K} — C the indexes of categorical 
variables, being = card{C) and uq = card{Q). 

Mixed metrics introduced by Gibert in [4], [10] is defined, for clustering pur- 
poses as a family of metrics indexed by the pair (a,/3): 



{^(a./3)(*u')}(a./3) e[0,l]x[0,l] 



( 1 ) 
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Being, + pdQ{i,i'); {a, f3) indexes for weighting the 

influence of variables in ^ versus those in Q] the normalized euclidian 

metrics calculated with variables in C. and d'q{i,i') a rewriting of metrics 
calculated with variables in Q, supporting symbolic representation: 

E (2) 

\/keQ 

where = var(Xk). Referring to d\{i,i'), is the number of observations 
equal to the j — th modality of Xk (namely Cj); I^i = card{i : = Xik)- An 

extended value appears for a class representative if X^ is not constant inside 
the class; it is represented as (/f ^ /f"'' ) where ,j = 1,2,..., Uk, is the 
proportion of objects of the class represented by i with Xik = Cj, then 

0 , if Xik — ^i'k 

Y~ A / ; If ^ik 7^ ^i'k 

> if ^ik = Cg and i' extended on Xk 
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In [10] an heuristic criteria is used to propose proper values for index {a,jd): 
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with = maxi^i'{d^^{i,i')} and dg^^^ = maXi^i>{dQ{i,i')}. This values^ 

refers the two components of the distance to a common interval, in order to give 
equal influence in the determination of d^^ fh® numerators give to each 

component a proportional weight to its presence in the objects description. 



Ralambondrainy proposal. In [21], Ralambondrainy also proposes a metrics 
to work with heterogeneous data matrices; it is defined exactly as expression (1). 
In [20], two practical ways of standardization for calculating {a, f3) are presented: 

- by the inertia: = ^ ; 7T2 = 

- by the norm: - 1 : fc € Q}, 

p^{Xk,Xk') correlation between Xk,Xk>; Uk number of modalities of Xk- 

Those proposals identify two elements of the Gibert family of mixed distances 
that will also be considered in this paper. 

^ Maximums can also be truncated to the 95% in order to acquire more robustness. 
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4 The Experiment 

As said before the main goal of this paper is to analyze the behavior of different 
elements of Gibert’s family in the clustering of different data sets. An experiment 
was designed according to that. For this work a single clustering algorithm will be 
considered: a hierarchical reciprocal neighbors algorithm using Ward criteria (see 
[24]). In future works other algorithms will also be taken into account. As a first 
approach, four elements of Gibert’s family will be considered in the experiment: 
^(ao./3o) ^ proposed by Gibert, as proposed by 

Ralambondrainy, and 5 q 5)(b *0 which represents a non-informed option with 
equal contribution to the distances of both components. 

On the other hand, experimental data has to be simulated (see §4.1). Struc- 
ture of data sets was decided on the basis of factors that can influence into 
the behavior of the metrics, regarding the clustering process: distinguishability 
of the classes is relevant (that’s why some data sets will contain overlapping 
classes and others separated ones, variance of classes will also be considered); 
also, the form of the classes is important (recognition of convex or filiform classes 
will be tested); finally different number of classes will be tested. 

For all the data sets, four clustering processes will be performed, one with 
every metrics indicated above. On the results of every clustering, relevant infor- 
mation will be codified in a new data matrix. A multivariate analysis will be 
done with it, to see relationships among different runs. It seems reasonable to 
determine good behavior on the basis of real data structure recognition, which 
is easy with simulated data, since real class of every object is a priori known. 



4.1 The Simulated Data Set 

The basis of experimental data is also following the guidelines presented in [2], 
where comparison of several hierarchical clustering methods is performed using 
several kinds of experimental data with different structures. Figure 1 shows the 
experimental data sets used in this work. It is obvious that it only shows the 
structure of the numerical part of data sets. Every data set contains also as many 
categorical variables as numerical, randomly generated with 3 modalities. 

Some data sets correspond to the proposal presented in [2], others are spe- 
cially introduced for our purposes. The basic structures from [2] are: concentric 
classes (fig. 1(d)), chained classes (fig. 1(f)), mixture of convex and concentric 
classes (fig. 1 (e)) and filiform classes in 2D (figure 1(g)), since it is known that 
certain clustering algorithms perform confusing recognition in this case. Regard- 
ing the discussion previously introduced, and making wider the scope of the 
analysis, other structures were added in the experiment: uniform, representing 
lack of structure (fig. l(j)), convex classes (fig. l(a,b,c)), which are supposed to 
be the easier to recognize; variability of the classes is increasing from (a) to (c) 
in such a way that distinguishability of classes decreases. Finally, filiform classes 
in three dimensions (see figure 1 (h)). 
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(d) Circular 01 (Cl) (e) Circular 02 (C2) (f) Circular 03 (C3) 




(g) I'iliforin 01 (11) (li) Kiliforin 02 (K2) 



(i) Kiliform 03 (K3) 




(j) Uniform (U) 



Fig. 1. Experimental data sets 



5 Results 

Every data set is clustered using the four metrics given in §4. Then some relevant 
information on the results (like number of resulting classes — which is an output 
in hierarchical clustering, size of every class, number of real classes, real classes 
form, etc) is used for a later Principal Components Analysis, in order to study 
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relationships among runs; tax of misclassification is used as a quality measure 
of runs ( fig. 2 shows the projection of the runs on the first factorial plane). 
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Fig. 2. First factorial plane with experimental run represented. 

It seems that first axis is opposing less structured data sets (on the left 
hand side, like uniform) against more structured ones (on the right, filiform 02 
or circular 03). The more remarkable thing, regarding the second axis, is that 
given a data set, its four runs use to be vertically displayed in two subsets: on the 
lower side the clustering performed with Gibert proposal p^-^{i,i') {-MA in 
the figure), much more down than a second group, where the rest of runs appear 
in close neighborhood (except for circular 02 and filiform 03 for which runs with 
{"R2 in the figure) are projected in intermediate positions). So, in 
general it can be said that the second axis is opposing Gibert proposal for (a, (3) 
to the other ones, which are difficult to distinguish. 

6 Conclusions and Future Work 

It has been seen that changing metrics produces real effects on the clustering 
results. It is then important to know when different metrics have better behaviour 
for recognizing real classes. 

From the four studied elements of Giber’t family, d(Q,g / 3 q)(*i *0 is the one 
which produces more different results on the used experimental data sets. It 
seems, from this work, that the other three possibilities do not produce great 
differences on the used clustering method. In addition, for case filiform 03, 
^(ao allow recognition of real classes. 

Next step is to complete the experiment in order to check if this separate 
behavior of / 3 o)(b *0 is maintained, and if it is possible to obtain more knowl- 
edge on the other metrics; it will be interesting to work with different structures 
on the categorical part of data matrix, which was blocked for this work to uni- 
form distribution. After that, comparison with results reported in [2] will also be 
done, as well as with other proposals from the literature, like Gower coefficient. 
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In the last step, including other clustering algorithms will enable study of more 
general properties of those metrics. 
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Abstract In this paper, a comparative analysis of the mixed-type variable fuzzy 
c-means (MVFCM) and the fuzzy c-means using dissimilarity functions 
(FCMD) algorithms is presented. Our analysis is focused in the dissimilarity 
function and the way of calculating the centers (or representative objects) in 
both algorithms. 



1 Introduction 

Restricted unsupervised classification (RUC) problems have been studied intensely in 
Statistical Pattern Recognition (Schalkoff, 1992). The fuzzy c-means algorithm is 
based on a metric over a n-dimensional space. It has shown its effectiveness in the 
solution for many unsupervised classification problems. 

The fuzzy c-means algorithm starts with an initial partition then it tries all possible 
moving or swapping of data from one group to others iteratively to optimize the 
objective measurement function. The objects must be described in terms of features 
such that a metric can be applied to evaluate the distance. Nevertheless, the conditions 
in soft sciences as Medicine, Geology, Sociology, Marketing, etc., are quite different. 
In these sciences, the objects are described in terms of quantitative and qualitative 
features (mixed data). For example, if we look at geological data, features such as 
age, porosity, and permeability, are quantitative, while others such as rock types, 
crystalline structure and facies structure, are qualitative. Likewise, missing data is 
common in this kind of problems. In these circumstances, it is not possible measure 
the distance between objects; only the degree of similarity can be determined. 

Nowadays, the mixed-type variable fuzzy c-means algorithm (MVFCM) of Yang 
et al. (2003) and the fuzzy c-means using dissimilarity functions (FCMD) of 
Ayaquica (2002) (see also Ayaquica and Martinez (2001)) are the most recent works 
that solve the RUC problem when mixed data appear. 

In this paper, the mixed-type variable fuzzy c-means and the fuzzy c-means using 
dissimilarity functions algorithms are analyzed. In addition, a comparison between 
them is made. 

A. Sanfeliu and J. Ruiz-Shulcloper (Ed.s.): CIARP 2003, LNCS 2905, pp. 472-479, 2003. 

© Springer-Verlag Berlin Heidelberg 2003 
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2 Mixed-Type Variable Fuzzy C-Means Algorithm (MVFCM) 



In this section, the mixed-type variable fuzzy c-means algorithm (MVFCM) of Yang 
et al. (2003) is presented. They proposed a dissimilarity function to handle symbolic 
and fiizzy features. 

The dissimilarity function used to evaluate the dissimilarity between symbolic 
features is the function proposed by Gowda and Diday, with some modifications. 
According to Gowda and Diday, the symbolic features can be divided into 
quantitative and qualitative in which each feature can be defined by dp{Ai^,Bi^) due to 
position p, d^Ai;,Bi,) due to span ^ and dJiAi^B^) due to content c (see Yang et al. 
(2003) for details). 

a) Quantitative features d[A^,B^)= d^[A^,Bf)+ dXA„B,)+dXA,B,) 

b) Qualitative features , 5^ ) = d ,Bt)+d^ 



Let A = m(a\,a 2 ,a?„a^) and B = m{bi,b 2 M,b 4 ) be any two fuzzy numbers. The 
dissimilarity dfA,B) is defined as d}(A.B) = ^[gl + gl+{g_-{a,-bAf+(g^+{a,-bAf) 

where g_ = 2(a, -b,)-(a^ -b^) and = 2(a, -b,) + (a^ -^ 2 )- 

Then, the dissimilarity function for both symbolic and fuzzy features is 









( 1 ) 



Let {x,,...,xj be a data set of mixed feature types. The MVFCM objective 



function is defined as j{p,e,a) = Z Z A" o'" (V ■ 4 ) .where d^{x.,A,) is (1). 

( 2 ) 



The equation to evaluate the membership degree is 

Iwwxr'l ' 



,c, j = l,....n 



This algorithm evaluates two cluster centers, one for symbolic features as 
Aj. = where is the />th event of symbolic feature k' in 

cluster i and is the membership degree of association of the p\h event to 

the feature k' in the cluster i. 

The equation for e^, , is 






Z".,A;-g 

Z-.,< 



(3) 



where 9 s {o,l} and 1 if the ^h feature of the yth datum A) consists of the />th event, 

otherwise 9=Q. The membership of Aj in the cluster i is p-j = pfX. ) . 

For fuzzy features the center is calculated considering An^ as the kih. fuzzy feature 
of the /th cluster center with parametric form A.^ = m(a.^^ , ) where 

Y'' //"'(Sv — r +r A- n — n j (^) 
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Z” - a , ,3 - 

““ = ^ ’ 

Z + x .,3 + 2 fl,„ - a., 3 ) 

Z” 1 A,j( 2 XyH + + Xju - 2 %, - 



MVFCM A Igorithm 

Step 1: Fix m and c. Give s>0. Initialize a fuzzy c-partition /;*“* = Set 1=0. 

Step 2: For symbolic feature k' , compute * cluster center 

41 ' using (3). For fuzzy feature k, compute /* 

cluster center d W = (« ,2 , , ag , a ^ ) using (4). 

Step 3: Update /i*“*using (2) 

Step 4: Compare / 2 *'*‘* with in a convenient matrix norm. 

IF ||^<'"'> - //(')|| < f , THEN STOP 

ELSE I = l+\ and GOTO Step 2. 

In this algorithm a dissimilarity function defined as the sum of the dissimilarity 
between symbolic features and the dissimilarity between fuzzy features is used to 
solve the mixed data problem. However, the dissimilarity for symbolic and fuzzy 
features is always computed using the expressions dp, d„ dc and t/y respectively. In the 
practice, the manner for evaluating the similarity between feature values is not only in 
dependence of the nature of features. Also, the context or the problem must be 
considered. When dp, d^, d^ and dfMC used we are forcing to evaluate the dissimilarity 
always in the same form independently of the context or nature of the problem. A 
fixed function does not allow representing the criterion used by the specialist to 
compare these features in a determined context. Therefore if two features are of the 
same type, the manner of comparing them not necessarily must be the same. 

In other hand, this algorithm evaluates two cluster centers, one for symbolic 
features and other one for fuzzy features. These cluster centers are fictitious elements, 
i.e. the cluster centers cannot be represented in the same space of the objects however 
they are used to classify the objects. 



3 Fuzzy C-Means Algorithm Using Dissimilarity Functions (FCMD) 

In this section, the fuzzy c-means algorithm using dissimilarity functions (FCMD) of 
Ayaquica and Martinez (2001) is presented. 

Let us consider a clustering problem where a data set of n objects {0i,02,—,0„} 
should be classified into c clusters. Each object is described by a set 7? = {vi,x 2 ,...pc„,} 
of features. The features take values in a set of admissible values Z)„ x,(0/) e Z)„ 
i=l,2,...,m. We assume that in Z), there exists a symbol "?" to denote missing data. 
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Thus, the features can be of any nature (qualitative: Boolean, multi-valued, etc. or 
quantitative: integer, real) and incomplete descriptions of the objects can be 
considered. 

For each feature a comparison criterion C. : D.x D. ^ L. is defined. 



where Li is a totally ordered set. This function allows to evaluate the similarity 
between two values of a feature. In the practice, this function is defined in basis of the 
manner to compare or evaluate the similarity between two values of the feature. When 
features are numeric, it is usually used a norm or distance, but it cannot be the unique 
way to evaluate the similarity between values. Therefore, in the formulation proposed 
here, this function is a parameter that the user can define according to the problem. 

Some examples of comparison criteria are: 



1. C,(x,fe),x.(oJ) = 



if {p - ) = {pj ) V x^ (Oj ) = ?v x^ (Pj ) = ? 
otherwise 



where Xs{0) is the value of the feature Xj in the object O; 0 means that the values are 
coincident and 1 means that the values are different; “?” denote missing data. This is a 
Boolean comparison criterion. 



2. C.(x(0,.),x(0,.))=< 



0 

1 

2 

k-\ 



V ^.(0,)=.^vx^(oJ=.? 
if X, (Oj ), x^ (o . }e 
if xSOi),x,{o.)eA^ 

if xXO,).x,{Oj)eA^,_, 



where A^^u,...,uA^^_, =D^ . This is a A:-value comparison criterion. 

3 . Q (x^ (Oj ), x^ [Og )) = I X, (o, ) - x^ (o J I this is a comparison criterion that takes real values. 



In this way, it is not fixed a unique comparison criterion for all problems to solve, 
but fairly we give the liberty of using the comparison criterion, which more reflects 
the manner that the objects are compare in the practice. Note that the dissimilarity 
functions defined by Yang et al. (2003) for quantitative and qualitative features may 
be used too. 

In addition, let T .■ (Z)j x ...x ^ [o.l] be a dissimilarity function. This function 

allows evaluating the dissimilarity between object descriptions. Thus, this function is 
given in dependence of comparison criteria. 

Let 

'y{OnO:)=J^p{xAo,U{Oj)) /'R ( 5 ) 

j 

be the dissimilarity between the objects <9, and Ok- The value satisfies the 



following three conditions: 

1. 't(o,,oJg[o,i] 

2. 't(o.,O,.) = 0 

3. T'(0,.0j=T'(0„0,) 



for \< j <n, \ <k<n 

for 1 < j <n 

for l< j <n, \<k<n 



Let Uik the degree of membership of the object Ok in the cluster W, and let be 
the set of all real cxn matrices. Any fuzzy c-partition of the data set is represented by 
a matrix U=\uik\&R‘^^ . The fuzzy c-partition matrix U is determined from 
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minimization of the objective function given by = where 9 

k=l i-1 

is a set of representative objects, one for each Ki, and 'i'[o^,0') is the dissimilarity 
between the object Ok and the representative object O* of Ai,. In the case of classical 
fuzzy c-means 9 are the centers for the clusters and 'F is the Euclidean distance. 

Since in our algorithm the objects descriptions are not only in terms of quantitative 
features the mean cannot be computed. Then instead of use a center (or centroid) for a 
cluster we will use an object in the sample as representative for this cluster. In order 
to detennine a representative object Oi for the cluster AT,, i=\,...,c we proceed as 
follows: 

We consider the crisp subset K] of objects that have their maximum degree of 



membership in this cluster AT,. Then the representative object of cluster AT, is 
determined as the object O, that satisfies 

The object O* may be not unique, then we take the first object found. 

Note that our algorithm considers as representative object one object of the sample 
instead of one fictitious element as occurs in the MVFCM algorithm. 

In order to determine the degree of membership of the object Ok to the cluster 
we define for each object Ok the sets ={i/l</<c; 'f(Oj,0*)=o| This set 
contains the indexes of the clusters such that the dissimilarity between the object to 
classify Ok and the representative objects O*, is zero. And = {l,2,...,c}-/j in 



this set are those indexes of the clusters such that the dissimilarity between Ok and 
Oi , is greater than zero. 

Thus, the degree of membership of Ok to Ki is computed via (7a) or (7b). 

1 (7a) 



:0 



T'(o.,o;) 

9'(0',o;) 



We can see that the degree of membership u,k increases if simultaneously the 
dissimilarity between Ok and O, for K, decreases and the dissimilarity between Ok and 
Oj for Aj,y=l,..,c, increases (and vice versa). 

1,^0 => M = 0 V/ G 7 and u = 1 (7b) 

k ik k ik 

ieJ, 

The equation (7b) is the alternative form for Ok when 3;g/j so that 'P(o,,(?‘)=0. 



The membership of Ok to the clusters AT, {unj i e 7^ will be | — r , i.e., the degree of 

r*| 

membership is distributed among the clusters K„ iel^ . In addition, for the clusters 



i e 7j we assign zero as degree of membership. 

FCMD Algorithm 
Step 1. Fix c, 2<c<n. 1 = 0 

Step 2. Select c objects in the data as representative objects, 9*^. 
Step 3. Calculate the c-partition 7/^ using (7a) and (7b). 
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Step 4. Determine the representative objects for each fiizzy cluster using (6), 9*^^**. 
Step 5. If 9® = then STOP 

Otherwise /=/+! and go to step 3. 

An important point to highlight is that this algorithm has the flexibility that uses a 
dissimilarity function, which is defined in terms of comparison criteria. The 
comparison criteria allow expressing the way in which features values are compared 
depending of the problem context to solve. 

This algorithm, unlike the MVFCM algorithm, evaluates a unique “cluster center” 
called the representative object, which is an object of the sample instead of one 
fictitious element as occurs in the MVFCM algorithm. It is more reasonable consider 
an object of the data set to classify as representative object instead of using an 
element that cannot be represented in the same space of the objects. 



4 Analysis 

In this section, the c-partitions generated by both algorithms are analyzed. The 
analysis is based on the manner to calculate the membership degrees and the way to 
calculate the cluster centers (representative objects). 

In order to make the analysis, the data set shown in Table 1 was used. There are 10 
brands of automobiles from four companies: Ford, Toyota, China-Motor and Yulon- 
Motor in Taiwan. In each brand, there are six feature components -company, exhaust, 
price, color, comfort and safety. In the color feature, the notations W=white, S=silver, 
D=dark, R=red, B=blue, G=green, P=purple, Gr=grey and Go=golden are used. The 
features: company, exhaust and color are symbolic, the feature price is real data and 
the features comfort and safety are fuzzy. The obtained results are shown in Table 2. 

In this experimentation for FCMD 'P was used as dissimilarity function and as 
comparison criteria the functions d{Ak,Bk) defined by Yang for quantitative and 
qualitative features were used. In other words, the same criteria for features were used 
in both algorithms. 

The results shown in table 2 for MVFCM were taken from Yang et al. (2003). 



Table 1. Data set of automobiles 



No. 


Brands 


Company 


Exhaust 

(L) 


Price 

(NTS 10000) 


Color 


Comfort 


Safetiness 


1 


Virage 


China-Motor 


1.8 


63.9 


W,S,D,R-,B 


[10,0,2,2] 


[9,0,3,3] 


2 


New Lancer 


China-Motor 


1.8 


51.9 


W,S,D,R,G 


[6, 0,2, 2] 


[6,0,3,3] 


3 


Galant 


China-Motor 


2.0 


71.8 


W,S,R,G,P,Gr 


[12,4,2,0] 


[15,5,3,0] 


4 


Tierra Activa 


Ford 


1.6 


46.9 


W,S,D,R,G,Go 


[6,0,2, 2] 


[6,0,3,3] 


5 


M2000 


Ford 


2.0 


64.6 


W,S,D,G,Go 


[8,0,2, 2] 


[9,0,3,3] 


6 


Tercel 


Toyota 


1.5 


45.8 


W,S,R,G 


[4,4,0, 2] 


[6,0,3,3] 


7 


Corolla 


Toyota 


1.8 


74.3 


W,S,D,R,G 


[12,4,2,0] 


[12,0,3,3] 


8 


Premio G2.0 


Toyota 


2.0 


72.9 


W,S,D,G 


[10,0,2,2] 


[15,5,3,0] 


9 


Cerfiro 


Yulon-Motor 


2.0 


69.9 


W,S,D 


[8,0, 2,2] 


[12,0,3,3] 


10 


March 


Yulon-Motor 


1.3 


39.9 


W,R,G,P 


[4,4,0,2] 


[3,5,0, 3] 



As the dissimilarities matrices are symmetric then triangular matrices are shown in 
a unique matrix in the Table 3, where a) is calculated using the expression (1) and b) 
is calculated using the expression (5). The values in b) are values of 'P normalized in 
[ 0 , 1 ] . 
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Table 2. Clusters obtained with MVFCM and FCMD for mixed data. 



Data 


1 ^ 1 / 


MVFCM 

^ 2 / 


111 ,- 


FCMD 


1 


0.9633 


0.0367 


0.9215 


0.0785 


2 


0.9633 


0.0367 


0.0000 


1.0000 


3 


0.9951 


0.0049 


0.9959 


0.0041 


4 


0.0966 


0.9034 


0.0020 


0.9980 


5 


0.9951 


0.0049 


0.9561 


0.0439 


6 


0.0135 


0.9865 


0.0047 


0.9953 


7 


0.9633 


0.0367 


0.9959 


0.0041 


8 


0.9951 


0.0049 


0.9978 


0.0022 


9 


0.9951 


0.0049 


1.0000 


0.0000 


10 


0.0185 


0.9815 


0.0258 


0.9742 



Table 3. Dissimilarities matrices. 



0.0 


0.0213 


0.0132 


0.0397 


0.0006 


0.0471 


0.0156 


0.0148 


0.0062 


0.0820 


676.1 


0.0 


0.0646 


0.0031 


0.0220 


0.0055 


0.0725 


0.0677 


0.0460 


0.0205 


420.8 


2046.0 


0.0 


0.0929 


0.0133 


0.1025 


0.0023 


0.0009 


0.0041 


0.1517 


1257.2 


101.1 


2943.3 


0.0 


0.0412 


0.0009 


0.1039 


0.0974 


0.0719 


0.0085 


19.3 


698.3 


423.5 


1305.5 


0.0 


0.0480 


0.0152 


0.0138 


0.0047 


0.0839 


1492.9 


175.1 


3248.4 


31.0 


1520.5 


0.0 


0.1142 


0.1073 


0.0801 


0.0059 


494.7 


2297.0 


73.0 


3293.1 


482.5 


3619.2 


0.0 


0.0025 


0.0046 


0.1666 


471.1 


2145.9 


31.2 


3086.2 


438.3 


3400.0 


79.7 


0.0 


0.0031 


0.1582 


197.4 


1457.4 


131.8 


2277.8 


149.6 


2538.1 


147.8 


99.8 


0.0 


0.1259 


2596.7 


649.5 


4807.1 


269.3 


2657.6 


187.2 


5277.9 


5011.9 


3987.4 


0.0 



The fuzzy c-means algorithm has as main characteristic that builds clusters where 
objects with low dissimilarity obtain high membership degree into the same cluster 
while objects that are relatively distinct obtain high membership degree into different 
clusters. 

The object 2 obtains high membership degree into the cluster 1, but it has low 
dissimilarity with objects having high membership degree to the cluster 2 (see Table 
3), i.e. the description of the object 2 is more similar with the description of the 
objects 4, 6 and 10 (see Table 2). Therefore, the object 2 should have high 
membership degree to the cluster 2. So the MVFCM algorithm does not build clusters 
with the characteristic above mentioned. The membership degrees for MVFCM are 
calculated using the expression (2). This expression is in function of the dissimilarity 
between the object to be classified and the cluster centers. In the example, the object 2 
obtains high membership to the cluster 1 because it is less dissimilar with the center 
of cluster 1 than the center of the cluster 2. So that the cluster centers play a 
determinant role in these dissimilarity values; therefore the obtained c-partitions 
depend of these centers. 

The FCMD, unlike the MVFCM, builds clusters, which satisfy the characteristic 
above mentioned. So the object 2 obtains high membership degree to the cluster 2. 
The membership degrees for FCMD are evaluated using the expression (7a) and (7b). 
These expressions also are in function of the dissimilarity between the object to be 
classified and the representative objects. But in this case, the representative objects 
are objects in the sample. So, if two objects are low dissimilar with the representative 
object, then they must be low dissimilar between them. In this example, the object 2 is 
just the representative object of the cluster 2. 

In addition, the objects I, 2 and 7 obtain the same membership degree to the cluster 
1, and then according to the fuzzy c-means algorithm classification strategy, the 
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descriptions of these objects must be similar or equal. However, the dissimilarities 
between the objects 1, 2 and 7 are very different; the dissimilarity between 1 and 2 is 
676.1, the dissimilarity between 2 and 7 is 2297.0 and the dissimilarity between 1 and 
7 is 494.7 (see Table 3). This shows that, the manner in which the cluster centers are 
calculated in the MVFCM algorithm determines that objects having low dissimilarity 
with the cluster center can be very dissimilar among them. In the case of FCMD 
algorithm, the object 2 has a high membership degree to the cluster 2 and the objects 
1 and 7 both have different membership degree to the cluster 1 (see Table 2). 

When objects have the same membership degree to a cluster for FCMD algorithm, 
for example, objects 3 and 7 in the cluster 1; the dissimilarity between them is 
73.0357, very low (see Table 3). Again this situation occurs because the 
representative object is an object in the sample. 



5 Conclusions 

The FCMD algorithm allows using comparison criteria defined by the specialist 
according to the specific context of a practical problem. In addition, this algorithm 
evaluates “cluster centers” called representative objects, which are objects in the 
sample instead of a fictitious element as occurs in the MVFCM algorithm. Also, as 
we can observe in the definition of comparison criteria, the symbol “?” was 
introduced to denote missing data, then the FCMD algorithm allows working with 
databases that contain incomplete descriptions of objects. 

We can observe that the MVFCM algorithm builds clusters containing objects 
which have high membership degree to a cluster but with low dissimilarity with 
objects belonging with high membership degree to other clusters. On the other hand, 
the FCMD algorithm builds clusters where the objects with high membership degree 
to a cluster have low dissimilarity among them. 

Based on the analysis made we can say that the FCMD algorithm is a more flexible 
alternative in the solution of fuzzy unsupervised classification problems where mixed 
and missing data appear. 
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Abstract. In this paper we propose the extended star clustering algo- 
rithm and compare it with the original star clustering algorithm. We 
introduce a new concept of star and as a consequence, we obtain dif- 
ferent star-shaped clusters. The evaluation experiments on TREC data, 
show that the proposed algorithm outperforms the original algorithm. 
Our algorithm is independent of the data order and obtains a smaller 
number of clusters. 



1 Introduction 

Glustering algorithms are widely used for document classification, clustering of 
genes and proteins with similar functions, event detection and tracking on a 
stream of news, image segmentation and so on. For a good overview see [1,2]. 
Given a collection of n objects characterized by m features, clustering algorithms 
try to construct partitions or covers of this collection. The similarity among the 
objects in the same cluster should be maximum, whereas the similarity among 
objects in different clusters should be minimum. 

One of the most important problems in recent years is the enormous increase 
in the amount of unorganized data. Gonsider, for example, the web or the flow 
of news in newspapers. We need methods for organizing information in order to 
highlight the topic content of a collection, detect new topics and track them. 
The star clustering algorithm [3] was proposed for these tasks and three scalable 
extensions of this algorithm are presented in [4]. The star method outperforms 
existing clustering algorithms such as single link [5], average link [6] and k- 
means [7] in the organizing information task as it can be seen in [3]. However, 
the clusters obtained by this algorithm depend on the data order and it could 
obtain “illogical” clusters. 

In this paper we propose a new clustering method that solves some of its 
drawbacks. We define a new concept of star and as a consequence, we obtain 
different star-shaped clusters. Both algorithms were compared using TREG data 
and the experiments show that our algorithm outperforms the original star clus- 
tering algorithm. 

The rest of the paper is organized as follows. Section 2 describes the star 
clustering algorithm and shows its drawbacks. Section 3 describes the proposed 
algorithm and the experimental results are shown in Section 4. Finally, conclu- 
sions are presented in Section 5. 
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2 Star Clustering Algorithm 

The star algorithm is different to the Scatter-Gather [8] and Charikar algorithm 
[9], because it does not impose a fixed number of clusters as a constraint on 
the solution. Besides, it guarantees a lower bound on the similarity between the 
objects in each cluster if the space of representation has metric properties. The 
clusters created by the algorithm can be overlapped. This is a desirable feature 
in the organization information problems, since documents can have multiple 
topics. 

Two objects are /3o-similar if their similarity is greater or equal to /3o, where 
/So is a user-defined parameter. We call /3o-similarity graph the undirected graph 
whose vertices are the objects to cluster and there is an edge from vertex Oi to 
vertex oj, if oj is /3o-similar to o^. Finding the minimum vertex cover of a graph 
is a NP complete problem. This algorithm is based on a greedy cover of the 
/Sp-similarity graph by star-shaped subgraphs. A star-shaped subgraph of / -I- 1 
vertices consists of a single star and I satellite vertiees, where there exist edges 
between the star and each of the satellite vertices. The stars are the objects 
with highest connectivity. The isolated objects in the /?o-similarity graph are 
also stars. The algorithm guarantees a pairwise similarity of at least /3 q between 
the star and each of the satellite vertices, but such similarity is not guaranteed 
between satellite vertices. Another characteristic of this algorithm is that two 
stars are never adjacent. 

The star algorithm stores the neighbors of each object in the /3o-similarity 
graph. Each object is marked as star or as satellite. The main steps of the 
algorithm are shown in Algorithm 1. 



Algorithm 1 Star clustering algorithm. 

Calculate all similarities between each pair of objects to 
construct the /? 0 “siniilarity graph 

Let N{o) be the neighbors of each object o in the /3o“similarity 
graph 

Let each object o initially be unmarked 
Sort the objects by degree |A(o)| 

While an unmarked object exists: 

Take the highest degree unmarked object o 
Mark o as star 
For o' in N(o): 

Mark o' as satellite 
For each object o marked as star: 

Add a new cluster {o} U N (o) 



The complexity time of the algorithm is 0{n?m), since it must calculate the 
similarities between all objects, each with m features. 
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The star clustering algorithm has some drawbacks. First, the obtained clus- 
ters depend on the order of the objects. If two or more neighbor objects with 
the same degree exist, only the first of them in the arrangement is a star. This 
problem is illustrated with the help of Figure 1, where the dark circles are the 
obtained stars and the clusters are outlined. In the figure on the left, the star 
algorithm takes first the object B, thus B is star and A, C are satellites. How- 
ever, if object A (or C) is the first in the arrangement, the algorithm obtains the 
clusters shown in the figure on the right. As we can see, the obtained clusters 
are different. 



The second main drawback of the star algorithm is that it can produce il- 
logical clusters. Since two stars are never neighbors, the illogical clusters could 
be obtained. Figure 2 shows this problem. Object B should be a star and its 
neighbors with less degree should not be stars. 



3 Extended Star Clustering Algorithm 

In our algorithm we make two main changes with respect to the star clustering 
algorithm mentioned above. The complement degree of an object o is the degree 
of o taking into account its neighbors not included yet in any cluster, namely: 




Fig. 1. Dependency of the data order. 




Fig. 2. Illogical clusters. 



CD{o) = \N{o)\Clu\ 



where Clu is the set of objects already clustered. As we can see, the complement 
degree of an object decreases during the clustering process as more objects are 
included in clusters. 
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Besides, an object o is considered a star if it has at least a neighbor o' with 
less or equal degree than o that satisfies one of the following conditions: 

— d has not a star neighbor. 

— The highest degree of the stars that are neighbors of o' is not greater than 
the degree of o. 

It is worth mentioning that these conditions are necessary but not sufficient. 
That is, some objects that satisfy the previous conditions could not be selected 
as stars by the algorithm. 

The main steps of our algorithm are shown in Algorithm 2. 



Algorithm 2 Extended star clustering algorithm. 

Calculate all similarities between each pair of objects to 
construct the /3o~similarity graph 

Let N(o) be the neighbors of each object o in the /3o~similarity 
graph 

For each isolated object o (|A(o)| = 0) : 

Create the singleton cluster {o} 

Let L be the set of non-isolated objects 
Calculate the complement degree of each object in L 
While a non-clustered object exists: (*) 

Let Mq be the subset of objects of L with maximum 
complement degree 

Let M be the subset of objects of Mq with maximum degree 
For each object o in M : 

If o satisfies the condition to be a star, then 
If {o} U A(o) does not exist : 

Create a cluster {o} U A^(o) 

Delete the processed objects from L (**) 

Update the complement degree of the objects in L 



In the step (**) we can delete from L the objects in M or all the objects 
already clustered. We named these variations the unrestricted and restricted 
versions of the algorithm. In the restricted version, only the objects not yet 
clustered, can be star. Each version has advantages and disadvantages. The 
unrestricted version has more possibilities to find the best stars. The restricted 
version is faster but can obtain illogical clusters. 

The complexity time of the algorithm is 0(rdm) and is determined by its 
first step: the calculation of the similarities between objects. The complexitiy 
of the cycle (*) is O(n^). In the worst case, this cycle is repeated log 2 n times. 
The most expensive step in it, is the update of the complement degree of the 
objects, which is if we have a table containing the differences between 

all combination of blocks of bits with a size log 2 n. This table is not needed if 
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log 2 n is less than or equal to the word of the computer processor, because in 
this case the difference operation between blocks of bits is 0(1). 

The proposed algorithm creates overlapped clusters and guarantees a pair- 
wise similarity /3o between the star and its neighbors. Unlike the original star 
clustering algorithm, the obtained clusters are independent of the data order. 
If two or more objects with the highest connectivity exist, our algorithm select 
as stars all of them. Besides, the selection of stars using the complement de- 
gree allows to cover quickly the data and it reduces the overlapping among the 
clusters. 

The extended star clustering algorithm solves the problems of the original 
star clustering algorithm cited on section 2. An object is considered a star even 
if it has a star neighbor. Moreover, the second condition in the new star concept 
guarantees that if an object is a neighbor of two possible stars, both will be 
stars. Thus, it is independent of the order of objects. 

Figure 3a) shows the stars obtained by our algorithm in the same case that 
in Figure 1. 





Fig. 3. Solutions: a) Order problem, b) Illogical clusters. 



The unrestricted version of our algorithm does not form illogical clusters 
because it allows neighbor stars. Figure 3b) shows the stars obtained in the 
same case that in Figure 2 for the unrestricted version. 

4 Experimental Results 

In order to evaluate the performance of our algorithm, we compared it with the 
original star clustering algorithm. We used data (in Spanish) from the TREC-4 
and TREC-5 conferences as our testing medium [11]. The TREC-4 collection 
contains a set of “El Norte” newspaper articles in 1994. This collection has 5828 
articles classified in 50 topics. The TREC-5 consists of articles from AFP agency 
in 1994-1996 years, classified in 25 topics. We have only the data from 1994, for 
a total of 695 classified articles. 

The documents are represented using the traditional vectorial model. The 
terms of documents represent the lemmas of the words appearing in the texts. 
Stop words, such as articles, prepositions and adverbs are disregarded from the 



Extended Star Clustering Algorithm 485 



document vectors. Terms are statistically weighted using the normalized term 
frequency (TF). Moreover, we use the traditional cosine measure to compare the 
documents. 

To evaluate the quality of our algorithm, we partitioned the “El Norte” col- 
lection in five sub-collections. Each sub-collection is composed of articles related 
to 10 distinct topics. So, if we add the AFP collection we have a total of 6 
collections. The general characteristics of these collections are summarized in 
Table 1. 



Table 1. Description of collections 



Collection 


# of documents 


Topics 


Collection 


# of documents 


Topics 


eln-1 


1534 


SPI-SPIO 


eln-4 


811 


SP31-SP40 


eln-2 


1715 


SP11-SP20 


eln-5 


829 


SP41-SP50 


eln-3 


1732 


SP21-SP30 


afp 


695 


SP51-SP75 



To evaluate the clustering results, we use the Fl-measure [12]. This measure 
compares the system-generated clusters with the manually labeled topics. It is 
widely applied in Information Retrieval Systems, and it combines the precision 
and recall factors. The Fl-measure of the cluster number j with respect to the 
topic number i can be evaluated as follows: 



Fl{i,j)=2 



Hi + rij 



where is the number of common members in the topic i and the cluster j, Ui 
is the cardinality of the topic i, and nj is the cardinality of the cluster j. 

To define a global measure, first each topic must be mapped to the cluster 
that produces the maximum Fl-measure: 



a{i) = max{El(z,j)} 

3 

Hence, the overall Fl-measure is calculated as follows: 

N N 

FI = - ^ mFlii, a{i)), S = 

* i=l i=l 

where N is the number of topics. 

In our experiments we compare the original star clustering algorithms with 
the unrestricted and restricted versions of the proposed algorithm. Table 2 shows 
the best Fl-measure obtained by the algorithms for optimized values of /3 q in 
the 6 collections. As we can see, both versions of our algorithm outperform the 
original star algorithm in all of these collections except in eln-5. Besides, in most 
collections our algorithm obtain less clusters than the original star algorithm. 
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This is another important result, because our algorithm achieves a greater pre- 
cision. 

If we compare the results obtained by the original star algorithm with the 
restricted version of our algorithm, we can see the effect of using the comple- 
ment degree and the new star concept. The restricted version always has better 
or equal FI values with smaller quantity of clusters. The unrestricted version 
outperforms the restricted one in three cases whereas it has smaller performance 
in one case. We can expect that the unrestricted version has the best perfor- 
mance in most cases, hence if the main goal is to obtain the best clusters, we 
recommend the unrestricted version of the algorithm. On the other hand, if we 
want to reduce the number of clusters or the execution time, we should use the 
restricted version of the algorithm. 



Table 2. Experimental results 



Algorithm 


Original Star 


Extended Star (restricted) 


Extended Star (unrestricted) 


Collection 


FI 


# of clusters 


FI 


# of clusters 


FI 


# of clusters 


afp 


0.76 


136 


0.77 


99 


0.78 


105 


eln-1 


0.61 


139 


0.63 


59 


0.63 


81 


eln-2 


0.66 


62 


0.67 


59 


0.72 


61 


eln-3 


0.53 


109 


0.53 


52 


0.59 


67 


eln-4 


0.55 


62 


0.58 


68 


0.58 


43 


eln-5 


0.74 


57 


0.74 


48 


0.72 


59 



5 Conclusions 

In this paper we presented a new clustering algorithm, named the extended star 
clustering algorithm. We use the complement degree of an object and we define a 
new concept of star. As a consequence, we obtain different star-shaped clusters. 
Our algorithm solves the problems of dependency of data order and illogical 
clusters of the original star algorithm. 

We compare the proposed algorithm with the original star algorithm in sev- 
eral collections of TREC data. Our algorithm obtains a better performance in 
these collections and produces less clusters. 

The new algorithm can be used in tasks such as information organization, 
browsing, topic tracking and new topic detection. Besides, our algorithm can be 
useful in other areas of Pattern Recognition. 

As a future work we will construct a parallel version of our algorithm to 
process very large data sets. 
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Abstract. The purpose of this paper is to discuss about feature selection 
methods. We present two common feature selection approaches: statistical 
methods and artificial intelligence approach. Statistical methods are exposed as 
antecedents of classification methods with specific techniques for choice of 
variables because we pretend to try the feature selection techniques in 
classification problems. We show the ai'tificial intelligence approaches from 
different points of view. We also present the use of the information theory to 
build decision trees. Instead of using Quinlan's Gain we discuss others 
alternatives to build decision trees. We introduce two new feature selection 
measures: MLRelevance formula and the PRelevance. These criteria maximize 
the heterogeneity among elements that belong to different classes and the 
homogeneity among elements that belong to the same class. Finally, we 
compare different feature selection methods by means of the classification of 
two medical data sets. 



1 Introduction 

The Pattern Recognition is an interdisciplinary science, having strong connections 
with Mathematics, Engineering and Computer Sciences. The following problems can 
be solved by means of the pattern recognition techniques: 

• search of effective object descriptions and 

• classification problems. 

In classification problems, the studied objects are described in terms of a set of 
features. Each feature (xi) has a set (Mi) of acceptable values and a comparison 
criterion (8i) associated to it. Suppose that a given training sample, in a framework of 
a supervised classification problem, has a (training) matrix representation Io(Ci, 
C 2 ,..., Cr), that is, object descriptions (01, 02, ..., Op) are stored in a matrix with as 
many columns as features, as many rows as objects in the sample, and they are split in 
groups corresponding with their respective classes (Ci, C 2 , ..., CQ. Likewise the 
succession l(0'i, 0'2, ..., O'q) of standard descriptions of the objects (O'l, 0'2, ..., 
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O'q) such that Oj^ Iq(Cj,C2,...,Cj.) with l<j<q is called control matrix. 

Usually the feature selection problem appear in the classification problems and in 
problems of search of effective object descriptions, as a necessary step to reduce the 
dimensions of the initial space of representation of objects and simplify the 
classification process complexity. The problem of the selection of features for the 
classification consists on finding an algorithm q such that: 

First g x„(0))) = K {x,^{0),x^^{0),..., 

VOeIO(Cl, C2,..., Cr) where K is a classification criterion It means that the 
algorithm q reduces the dimensions of the space without affecting the belonging of 
each object to its respective class. In other words, using the algorithm the belonging 
r-plus of the initial training matrix remains constant although the dimensions of the 
space are smaller than the initial dimensions. 

Second: Given a classification algorithm A and a function O that measures the 
quality of it. 

3>[A(/„(C„Q,...,C,),/(0;,0„...,0,))]<<1>[A(/„(C„C„...,CJ,4'(/(0;,02,-,0,)))] 

For we denote the projection of the control matrix in a new 

space. This new space is obtained from applying to the initial space. Theoretically 
having more features should give us more discriminating power. However the real 
world provides us with many reasons why this is not generally the case [1] : 

• First: the induction algorithm complexity grows dramatically with the number 
of features. 

• Second: the irrelevant and redundant features also cause problems in the 
classification context as they may confuse the learning algorithm by helping to 
obscure the distributions of the small set of truly relevant features for the task at 
hand. 

The analysis of the techniques and traditional criteria for feature selection will be 
exposed with details in other sections of this paper. The second section shows the 
most popular statistic and artificial intelligence techniques used to solve the feature 
selection problems. In the third section, well introduce two new criteria related with 
the relevance or the irrelevance of features, which are valid for any feature selection 
technique. Finally, well show comparison results of algorithms that use different 
criteria of feature selection. 



2 Statistical, Artificial Intelligence and Logical Combinatorial 
Pattern Recognition Techniques for Feature Variable Selection 

2.1 Statistic Techniques for Feature Variable Selection 

The feature selection appears in the classical statistics, in relation with all the 
techniques of the multivariate analysis, from the most elementary techniques: the 
analysis of variance (ANOVA) and the regression. In fact, in the multiple lineal 
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regression theory some new feature selection procedures appear, influencing the 
ulterior development of the multivariate statistic. We talk about the step-to-step 
methods as a way to get the better equation of regression among all the possible, 
keeping in mind the correlation among variables. The procedures "step-to-step" are 
easily extended to the classification statistical techniques: the discrimination analysis, 
the logistic regression, the decision trees, etc. [2] 

There isn’t a strong criterion to divide the classification procedures in separate 
groups. In a way sense, they are always extensions of some statistical techniques such 
as: the discriminate analysis, the methods based on decision trees (CHAID technique: 
Chi-square Automatic Interaction Detector), the methods of estimating of densities 
(KNN: A:-nearest neighbors), or the techniques of hierarchical group formations. 

These four procedures (linear discrimination, decision-tree, ^-nearest-neighbors 
and clustering) are prototypes for four kinds of classification procedures. Not 
surprisingly they have been refined and extended, but they still represent the major 
strands incurrent classification practice and research. Then, it may be a good criterion 
of classification. However, in [3], the authors preferred create groups of methods 
around the more traditional heading of classical statistics, modern statistical 
techniques. Machine Learning and neural networks. 

2.2 Artificial Intelligence Techniques for Feature Variable Selection 

There are a lot of applications of the heuristic search methods to solve the feature 
variable selection problems [4]. To characterize the feature selection algorithms four 
issues should be defined in [5]. 

Other approaches to solve the feature selection problems have as principal idea to 
apply a weighting function to features. The weighting schemes generally are easier to 
implement than the others machine learning methods. They are frequently more 
difficult to understand because usually work as a black box. 

Perceptron is a well-known feature weighting method, which adds or eliminates 
weights on the linear threshold unit in response to errors that occurs during the 
classification process. Many learning algorithms such as: back-propagation algorithm 
and least-mean square algorithm have been well studied. The results of Perceptron- 
weighting techniques can be affected when the number of irrelevant features grows. 
To decrease the sensibility of the Perceptron algorithm the Winnow algorithm is 
proposed in [5]. 

Other approach to solve the problems related to the relevant features in 
classification problems is the filter methods [6]. This viewpoint divides the feature 
selection process and the induction process. These methods make a preprocessing of 
the training data and filter out the irrelevant features before the induction process 
occurs. The filter methods work independent of the induction methods. They can be 
used in combination with different induction methods. Besides, the filter methods 
evaluate each feature based on its correlation with set of classes choosing the suitable 
number of relevant features. Two of the most well known filter methods for feature 
selection are RELIEF [7] and FOCUS [8]. 
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Other feature selection methodology, which has recently received much 
attention, is the wrapper model. This model searches through the space of feature 
subsets using the estimated accuracy from an induction algorithm as the measure of 
goodness for a particular feature subset [6]. Actually the wrapper methods are well 
known in statistic and pattern recognition. The principal notion in the wrapper 
methods is to determine the feature subset that allows us better estimations than 
separate measures. The major disadvantage of wrapper methods over filter methods is 
the high computational cost of them. The wrapper methods similar to the filter 
methods can be used in combination with different induction methods. In fact the 
OBLIVION [9] wrapper algorithm combines the wrapper notion with the nearest- 
neighbor method. 

The embedded approaches to determine relevant features are popular methods 
too. A clearest example of feature selection methods embedded within a basic 
induction algorithm, are the “methods for inducing logical descriptions”. For these 
algorithms the space of hypotheses is described by the partial ordering and the 
algorithms use this ordering to organize their search for concept descriptions. The 
core of these algorithms is to add or remove features from the concept description in 
response to prediction errors on new instances. For example, recursive partitions 
methods for induction, such as Quinlan’s ID3 Quinlan [10] , C4.5 [11] and CART 
[12] carry out a greedy search through the space of decision trees, at each node using 
an evaluation criterion to choose the feature having the best ability to discriminate 
among classes. 

Information theory is one approach to solve the information uncertainty 
problems; however, it’s not a tool for manipulating uncertain knowledge. Instead, it’s 
a tool for measuring uncertainty. In information theory, uncertainty is measured by a 
quantity called “entropy”. It’s similar to, but not the same as, the concept of entropy 
in physics [13]. An example of the entropy computation is presented in the selection 
variable building decision trees. In fact, Quinlan propose the ID3 algorithm to induce 
classifications rules in form of decision tree [11, 14]. In recent years, Quinlan 
introduces the algorithms C4.5 [11, 14] and C5.0 [15]. These Quinlan's algorithms 
improve the ID3 algorithm because they work with numeric and symbolic data and 
manipulate cases with missing values. 

In the information theory approach many other measures have been proposed, for 
example, instead of using Quinlan’s Gain, Mantaras [16] propose two-feature 
selection measures based on a distance between partitions. 



2.3 Logical Combinatorial Pattern Recognition and Tester Theory in the 
Feature Variable Selection 

Some problems related to the feature selection can be solved in the context of the 
testor theory. This is a branch of Mathematical Logic that began in the Soviet Union 
at the end of the 50’s. I. A. Cheguis and S. V. Yablonskii [17] were the first 
researchers that developed this theory. Their works were motivated by the problem of 
fault detection in logical schemes, particularly applied to computer logical circuits. 

In the middle of the 60’ s, Y. I. Zhuravlev adapted the testor concept to pattern 
recognition [18]. 
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Tester definition (Zhuravlev): If the complete set of features R allows us to 
distinguish between objects (rows of MI) from different classes, then R is a testor. 
Furthermore, any non-empty feature subset of R, that satisfies this property, is a 
testor. Others Tester's concepts, that improve the original Zhuravlev’s concept are 
proposed in [19] [20] 



3 Two New Alternative Criteria for Feature Variable Selection 



In this section we propose two alternative criteria to choose the relevant features in 
classification problems. Some theoretical results obtained from the analysis of this 
measure of relevance are presented. 



3.1 The MLRelevance Criterion 



Suppose a feature (A) with i = acceptable values, S set of samples and 5, 

subset of S that contains the samples having the value i in the feature A. Then the 
expression |5,|/|S| is the relative frequency of the value i in S. 



Then Equation 1 shows a measure that determines the relevance of the feature A. 

^ Is'-I 

MLRelevance measure i?(A)= ^ -p-y e*'' (1) 

! = 1 



where R{A) is the relevance measure of the feature A on set S, k is the number of 
different values for the feature A and C, is the number of different classes presented in 
objects having the value i for the feature A. 

Let us begin by saying some general aspects of our measure: 



• its principal idea is to maximize the heterogeneity among elements that belong 
to different classes and the homogeneity among elements that belong to the 
same class and, 

• 0 < R{A) < 1 and ^ -p-j = 1 . Consequently, the feature that maximizes R{A) 

i = i |5'| 

is better. 



• The Equation 1 will always be defined for any set S that is a good property of 
this equation. 



3.2 The PRelevance Criterion 

Another criterion is a lineal combination of the MLRelevance criterion and a 
heuristic. The core of this second metric deals with to determine the relevance of an 
attribute a as a lineal combination of the relevance of the isolated attribute “a” and 
the relevance of the groups of attributes B such that aG B. 
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3.2.1 Preliminary Concepts 

The heuristic that we use in the PRelevance computation is based on the rough sets 
theory [21]. 

Lets the decision system W = (U, D), and the sets B g 4 y [/. We can 

approximate S using only the information contained in B by constructing the B-lower 
and the B-upper approximations of S, denoted (Bt) and (B*) respectively. A rough set 
is any set S, S^U defined from its B-lower and B-upper approximations [22]. 

WeTl define indiscernibility, this is the fundamental notion in the rough sets 
theory. The objects that are characterized by the same information are indiscernible 
(similar) in the view of the information that is available. 

Definition 1 Indiscernibility: To each set of the attribute B such that B c4, is 
associated an indiscernible binary relation denoted by Ig. This relation allows us to 
determine which objects are indiscernible from the others by the relation. Ib= { (x,y) 
e UxU: f(x,ai)=f(y,ai) para todo ai e B}. If (x,y) e Ib we said that the objects x and y 
were B- indiscernible. 

The lower approximation of a set S respect to a set of attributes B is defined as 
the collection of objects which equivalences classes are contained completely in the 
set; whereas the upper approximation is defined as the collection of objects which 
equivalences classes are partially contained in the set. Formally, 



B*( S)= {xgU|B(x)cS} 


(2) 


B*( S)= {xEU|B(x)nS;^(|)} 

we can define the boundary region on 5 as: 


(3) 


BNb(S)=B*( S)-B.( S) 


(4) 



If the set BNb is empty then the set S is exact respect to the equivalence relation 
B. In any other case BNb(X) , the set S is inexact, vague, rough; respect to B. Using 
the lower and upper approximations of a concept, three regions are defined: 

I Positive region: POS(X) = B*(X). 

II Boundary region: BNb(X). 

III Negative region: NEG(X)=U-B*(X) 

3.2.2 Dependences between Attributes 

Intuitively, a set of decision attributes D, depends totally on a set of B attributes, 
denoted hy B ^ D, if all the values of the D attributes are univocally determined by 
the values of the attribute in B. 

In other words, D depends totally on b, if there is a functional dependency 
between the values of D and B [22]. 

Definition 2: Dependency in k grade. 

It’s said that D depends on B in a k grade (0< k <1 ), denoted by B D, by the k 
value, and defined by the expression 5. 



( 5 ) 
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Donde |J5,(^) 

XsVd 

If k=l then it’s said that D depends totally on B, while if k<l it’s said that D 
depends partially on B . 

3.2.3 PRelevance Computation 

From what it’s been defined till now, so far the calculus of PRelevance with respect to 
an attribute “a” it’s defined as RP(a) expression: 

PRelevance RP(a) = R(a) + H(a) (7) 

Where R(a) is the function of the equation 1 and H(a) is calculated as it shown in 
the algorithm 1. The attribute that maximizes RP(a) is the most relevant attribute. 

Algorithm 1 

Stepl : it is calculated the vector R(T) = (R(ai), R(a 2 ), R(a 3 ), . . . , R(aj(a))) with T C A 

Step2: It’s determined the n best attributes, begin the best those which maximize 
R(a;). As a result of this step the vector, RA = (R(aj), R(aj), ..., R(a,)) with n = | 
RA I, is obtained. 

Step3: The n combinations are determined in p from the attributes selected in the 
step2. A vector of combinations is obtained: Comb = ({ai, aj ai^},... {ai, ajap}) 
An example of it being, n = 4 and p = 3 and being the selected attributes in the step 
2 (ai, a 3 _ as, ag ) the combination vector has 

C = = 4 components which would be Comb = ({ai, a 3 as}, 

p\{n-p)\ 

{ai, as, ag), {ag, as, ag), {ai, as,ag|) . 

Step 4: We calculated the independency grade of the classes with respect to each of 
the obtained combinations in the previous step. As a result of this step we 
obtain the vector of dependencies DEP = (k(Combi , d), k(Comb 2 , d), ... 
k(Comb,,d)). 

Step 5: For each attribute “a” we calculate the value of H(a) following the equation 8: 

HW = I.i,..c.A<Con.b,.d) (8) 

As it can be appreciated in the computation of PRelevance for an attribute, is 
very expensive and depends on (" |A| J, ( n ),( p ) and ( |d| ). These parameters depend 
on the real problem that we can to solve. Also, if we want to use the PRelevance 
metric to build decision trees then the expensive procedure is repeated and the cost of 
our learning increases a lot. In order to reduce the learning duration we propose a 
PRelevance's implementation using a parallel platform (MPI, PVM) 



4 Comparisons between Different Feature Selection Methods 



In this section we shall compare different feature selection methods using data of two 
medical domains. We use in our comparisons: the thyroid database provided by the 
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Garvan Institute of Medical Research, Sydney and the heart database assays from the 
European Statlog project. Dept. Statistics and Modeling Science, Strathclyde 
University in 1993. Both medical databases appear in UCI Repository of Machine 
Learning databases. University of California [23]. 

We compare the correctness percents of classification among systems C5.0 [15], 
KNN IB4 implementation [24], MLClassif (VCramer), MLClassif (Mantaras) and 
MLClassif (MLRelevance). We use the VCramer formula; this is a measure of 
interrelation between variables [25] [2]. The C5.0 system developed by Ross Quinlan 
creates a decision tree based on Quinlan’s gain. The MlClassif system developed by 
our team creates partitions by recursive sorting of the training set. To rearrange each 
partition an appropriate feature is selected. To choose the most relevant feature in 
each moment we use: Mantaras’s distance, the VCramer formula or the MLRelevance 
measure. 

From each database we create randomly ten partition pairs (Table 1), having 
each partition pair 75% of elements for train and the rest for test. We execute the 
algorithms in each partition. The values that we show in the table 1 represent the 
percent of correct classification obtained from each algorithm in the partition. 

To compare the algorithms results we applied the Kruskal- Wallis Test for each 
variable; we used the Monte Carlo method for computing the significance level and 
considered 99% as confidence interval for the significance. 

The superscript letters used in tables 1 and 2 represent different sets. These sets 
were obtained from to apply the Kruskal- Wallis test. Values having the same 
superscript belong to the same set. It means that these values have not a significant 
difference. 

To compare two algorithms we used the Mann-Whitney U test for each variable; 
we used the Monte-Carlo method for computing the significance level and considered 
99% as confidence interval. 

In thyroid significant differences are found regard train and test variables, 
however in heart only is found significant differences regards train variable. The 
Table 2 shows the algorithms grouping. 



Table 1. Partition 1 experimental results 



Partition 1 , accuracy results 


Thyroids database 


Fleai't database 


Train 


Test 


Train 


Test 


MLClassif (Vcramer) 


30.12“ 


33.38“ 


89.16” 


88.06“ 


MLClassif (Mantaras) 


85.46” 


87.45” 


92.61*= 


89.55“ 


MLClassif (MLRelevance) 


96.39*= 


95.26*= 


93.U 


88.6“ 


KNN IB4 


87.4” 


83.9” 


71.8“ 


88.1“ 


C5.0 


98.3*= 


95.4*= 


93.1*= 


87.7“ 



496 



P. Pinero et al. 



Table 2. Resulting groups from applying the statistic tests to the classification results 



Group 


Thyroids (Train and test) 


Heart (Train) 


1 


C5.0^ 

MLClassif (MLRelevance)*’ 


C5.0", 

MLClassif (MLRelevance)**, 

MLClassif (Mantaras)** 


2 


KNN IB4^ 

MLClassif (Mantaras)'’ 


MLClassif (VCramer)'’ 


3 


MLClassif (VCramer)** 


KNN IB4** 



As a conclusion of the above tables: the methods of group 1, are better than the 
methods of group2, likewise the group2 methods are better than the group3 methods 
and the methods that belong to the same group don’t have significant differences. 



5 Conclusions 



The purpose of this paper was to discuss about feature selection methods. We 
presented there common feature selection approaches: statistical methods, logical 
combinatorial pattern recognition approach and artificial intelligence approach. For 
each approach we discussed some methods and algorithms. 

Statistical methods are presented as antecedents of the other methods with their 
specific techniques for choice and transformation variables. In the logical 
combinatorial pattern recognition we discuss the testor theory and its application to 
the classification and feature selection problems. Different artificial intelligence 
techniques are presented and its properties briefly discussed. 

We introduce two new relevance criteria the MLRelevance R{A) and the 
PRelevance RP(A). 

These feature selection criteria maximizes the heterogeneity among elements that 
belong to different classes and the homogeneity among elements that belong to the 
same class. R(A) always will be defined for any set S, 0 < R(A) < 1 and is not sensitive 
to the number of features values. The RP(A) computation is very expensive and we 
propose a PRelevance's implementation using a parallel platform. 

Finally we compare different features selection measures by means of two 
medical databases. We compare the measures: VCramer (statistic measure), C5.0 
algorithm (Quinlan's gain), Mantaras and MLRelevance. We conclude: C5.0 and 
MLRelevance obtain the best results and VCramer obtains the worse results in 
Thyroid database; while in Fleart database, C5.0, MLRelevance and Mantaras obtain 
the best results and KNN obtains the worse results. 
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Abstract. It is outlined new results of investigations into development of 
mathematical tools for analysis and estimation of information represented by 
images. It continues research of a new class of image algebras (lA) - the 
Descriptive Image Algebras (DIA). Practical implementation of DIA in image 
analysis applications requires a study of a set of operations, leading or not 
leading to DIA construction, having or not having physical interpretation. 
Operations of the ring in these algebras are both standard algebraic operations 
and special operations of image processing and transformation. The problem of 
operations that can be used for construction of DIA and of how this possibility 
is connected with physical interpretation of corresponding algebra operations is 
still open. This problem is reduced to formulation of the conditions that should 
be satisfied by a set of operations for construction of the DIA. The first stage of 
its solution is the construction of the examples of the sets of operations (having 
physical meaning), leading or not leading to DIA construction. The basic results 
of the report are both the method of testing the specified conditions and the 
examples of sets with various elements and the operations introduced on them 
(both generating algebras and not). 



1 Introduction 

The new results of investigations in the field of the development of mathematical 
apparatus for analysis and evaluation of information represented in the form of 
images are described in this paper. These studies have been conducted in the recent 
years at the Scientific Council “Cybernetics”, Russian Academy of Sciences, and they 
are concerned with development and implementation of the Descriptive Approach to 
Image Analysis and Recognition [2]. A new class of image algebras (standard I A was 
described in [4], definition 1) is defined within this framework - Descriptive Image 
Algebras (DIA [3], definition 2). The main purpose of this investigation is 
construction of unifying theory that covers different transformations and operations of 
image analysis, processing and understanding. DIA generalizes some famous 
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mathematical theory and the algebraic specificity of DIA is defined by the fact that 
the elements of the ring include both the models of images and the operations on 
images. This specific of DIA provides the effective synthesis and implementation of 
the basic procedures of formal image description, processing, analysis, and 
recognition. So the developed algebraic tools can help to solve many problems related 
to intellectual computer systems and to automatization of image-based decision 
making (i.e. design of systems for automated image processing and analysis, design of 
systems for automation of scientific research, high quality medical and technical 
diagnostics, ecological monitoring, remote sensing, non-destructive testing, quality 
control, etc). Useful practical implementation of new algebra is also development of 
language for comparison and standardization of different algorithms for image 
analysis, recognition and processing. 

The practical implementation of DIA in image analysis applications requires 
research of the set of operations, leading or not leading to DIA construction, having or 
not having physical interpretation. Operations of the ring in these algebras can be both 
standard algebraic operations and special operations of image processing and 
transformation. The problem of operations that can be used for construction of DIA 
and of how this possibility is connected with physical interpretation of corresponding 
algebra operations is still open. Thus, the following problems appear: (a) to define the 
class of the allowable operations, having physical meaning; (b) to define the class of 
the allowable operations having no physical meaning; (c) to define the class of the 
operation (both having physical meaning and not), which do not lead to the 
construction of algebras. In the whole, this problem is reduced to formulation of the 
conditions that should be satisfied by a set of operations for DIA construction. 

The first stage is the construction of the examples of the sets of operations (having 
physical meaning), which lead (or not) to DIA construction. The basic results of the 
paper are (a) 2 examples of the basic DIA having one ring with various elements and 
operations of a certain physical meaning: in the example 2 the elements of the ring are 
images; in the example 3 the elements of the ring are binary operation on images; (b) 
the example 4 where the constructed set of elements with operations introduced over 
it is an additive group. 

The following lines of research dating back to 1970s — 80s contributed to the 
development of DIA: (a) the Algebra of Zhuravlev (Yu.Zhuravlev and his scientific 
school [6]); (b) the Descriptive Approach for Image Analysis and Understanding 
(I.Gurevich [2]); (c) the General Pattern Theory developed by U.Grenander [1]; (d) 
the extended image algebra developed by G.Ritter [4]. 
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2 Images and Operations on Images. The Main Conceptions and 
Deflnitions 



2.1 Basic Definitions 

Definition 1: Algebra over field A is called lA, if the elements of its ring are images 
(sets of points) and the values and properties associated with these images. 

Definition 2: Algebra over field A is called DIA, if the elements of its ring are either 
the models of images (including images themselves and the values and properties 
associated with these images), or operations on images, or both the models and 
operations. 

Definition 3\ DIA is called a basic DIA, if its ring consists either of the image models 
or of the operations on images. 

For generality of the results, during construction of examples we use the definition 
of an image (see Definition 4) and the operations on images, introduced by G. Ritter 
in [4]. 

Definition 4: Let F be a set of values and X be a set of points. An F-valued image on 
X is any element of F^ (i.e., a: X->F): 

I = {(x, a(x)), X e X, a(x) e F}. 



2.2 Operations on Images ([4]) 

Let Ii = {(x, a(x)), x e X), I 2 = {(x, b(x)), x e X} and let the following operations be 
defined on the set F: operations of addition, multiplication, finding the maximum and 
inverse operations of difference, division, finding the minimum, and the operation of 
raising to power (V a(x), b(x)eF 3!a(x)+b(x); 3!a(x)*b(x); 3!a(x)vb(x); 3!a(x)-b(x); 
3!a(x)Ab(x); forb(x)7^0 3!a(x)/b(x); for a(x)>0 3!a(x)'’*’‘*). 

The basic operations on images from F’‘ are the pointwise addition, multiplication, 
and finding the maximum respectively: 

(1) Ii + I 2 = {(x, c(x)), c(x) = a(x) + b(x), x e X}; 

(2) Ii * I 2 = {(x, c(x)), c(x) = a(x) * b(x), x e X}; 

(3) Ii V I 2 = {(x, c(x)), c(x) = a(x) v b(x), x e X}.. 

The operations of difference, division, and finding the minimum are introduced as 
operations inverse to addition, multiplication, and finding the maximum respectively: 

(4) Ii - 12= {(x, c(x)), c(x) = a(x) - b(x), x e X}; 

(5) — — = {(x,c(x)), c(x) = — ^ b (x) ^ 0,c(x) = 0 ,b{x) = 0} S 

fi b{x) 

(6) /, A /j = {(x, c(x)), c(x) = a{x) A b{x), x g X} . 

Similarly, we may introduce other operations on images: 

(7) Ii'^ = {(x, c(x)), c(x) = a(x)'’*’‘' , if a(x) > 0, otherwise c(x) = 0, x e X}. 

We may introduce unary operations: for example, multiplication by an element from 
the field of real numbers (aeR): a Ii = {(x, c(x)), c(x) = a a(x), x e X) 
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3 Conditions of Membership in DIA-Class: Checking 
Satisfiability 



3.1 Examples of Sets of Operations Which Generate an Algebra 

Various sets U with the operations of addition, multiplication and multiplication by 
the element from the field of real numbers introduced on them are considered. 

1 . The elements of the set U: 

1 . 1 Images, defined on the set X with an arbitrary range of values F of the dimension 
equal to the dimension of the set X, i.e. X, F (z R" ; 

1 .2 Images, defined on the set X with the range of values X, X cz R" ; 

1.3 Standard binary operations on images [4]. 

2. Operations on the set elements (2. 1,2. 2, 2. 3): addition; multiplication; 

multiplication by an element from the field of real numbers. 

3. Physical meaning of the operations: 

3.1. (a) addition - total brightness of two images; (b) multiplication - pointwise filter; 
(c) multiplication by an element from the field of real numbers - proportional 
increase or decrease in image brightness. 

3.2. (a) addition - total brightness of two images; (b) multiplication - (bl) global 
(non-pointwise) filter; (b2) definition of one image on a set defined by the other 
image; (c) multiplication by an element from the field of real numbers - a 
proportional increase or decrease in image brightness. 

3.3. ( ) addition -global filter: first two operations are applied to both images; then 
result images are added; (b) multiplication - global filter: the second operation is 
applied to both images; the first and the second operands of the first operation 
are the result of applying second operation; (c) multiplication by an element from 
the field of real numbers -multiplication of an image by an element from the 
field of real numbers (unary operation on the image - a standard operation of the 
multiplication of an image by an element of the field of real numbers [4]). 

Below, we give the examples of the DIA, generated by the sets U with the specified 
characteristics (the types of set elements and operations together with their physical 
interpretation are specified in parentheses according to the list given above). Proof of 
DIA generation by the set U with the operations of addition, multiplication and 
multiplication by the element from the field of real numbers introduced on them is 
based on checking algebra properties, (definition of algebra is in [5]). Example 1 (1.1, 
2.1, 3.1) is omitted due to space limitations. 

Example 2: (2.1, 2.2, 2.3). 

Suppose that: 

• R is a field of real numbers; 

• I = {(x, f(x)), X e X, f(x) e X) (X c R", n eN); 

• Ii = {(x, a(x)), X e X, a(x) e X}, I 2 = {(x, b(x)), x e X, b(x) e X}; 
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• the operation of addition of two elements from X is introduced: i,j=l,2,... V a(x), 
b(x) e X: 3! a(x)+b(x)e X; this operation satisfies some conditions (V a(x), b(x), 
c(x) e X): 

2.1 a(x)+(b(x)+c(x)) = (a(x)+b(x))+c(x); 

2.2 a(x)+b(x) = b(x)+ a(x); 

2.3 V a(x) e X, 3 Oe X: a(x)+0= a(x); 

2.4 V a(x) e X, 3 (-a(x))e X: a(x)+(-a(x))= 0; 

• the operation of superposition of two elements from X is introduced: V a(x), b(x) e 
X: 3! a(b(x)) e X; 

• the operation of multiplication by the element of the field R is introduced on the set 
X: VaeR, a(x)eX: 3!aa(x)eX; this operation satisfies some conditions (V a(x), 
b(x), c(x) e X, Va, P eR): 

2.5 (a a(x)+ P b(x)) c(x) = a a(x) c(x)+ P b(x) c(x); 

2.6 a(P a(x)) = aP a(x); 

2.7 (a + P)a(x) = a a(x) +P a(x); 

2.8 a(a(x)+b(x)) = a a(x)+ a b(x). 

(So we can notice, that set X is vector field on field R (properties 2. 1-2.4, 2.7, 2.8)). 
Let us introduce 

• operation of addition of two images Ii, I 2 : 

Ij + I 2 = {(x, a(x) -H b(x)), X e X); 

• operation of multiplication of two images Ii, I 2 : 

Ii * I 2 = {(x, a(b(x))), X e X} (this operation leaves us in the set X); 

• operation of the multiplication of image I by an element of the field of the real 
number asR: 

al = {(x, a f(x)), X e X}. 

All properties of the ring, field, and vector space are satisfied, thus, the created 
construction is algebra. 

Example 3: (1.3, 2.3, 3.3). 

Suppose that 

• R is a field of real numbers; 

• the elements of a set U are the binary operations on images [4] ; 

• A,B,C... are the images, transforming X into X, Xc R"; 

• operations on images are introduced [4]; 

• ri,r 2 ,. . . e {h-,*,v,a,-,\,A® }, i.e., ri,r 2 ,. . .are the operations on two images; 

• r(A,B) is the image after applying the operation r on images A and B. 

Let us introduce 

• operation of addition of two operations ri,r 2 : 

(ri©r 2 )(A,B)=ri(A,B) + T 2 (A,B); 

• operation of multiplication of two operations ri,r 2 : 

(ri®r 2 )(A,B)= ri(r 2 (A,B),r 2 (A,B)); 

• operation of the multiplication of the operation r by an element of the field of the 
real number asR:(ar)(A,B)=ar(A,B) (the right part means the multiplication of an 
image by the element of the field). 
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All properties of the ring, field, and vector space are satisfied, thus, the created 
construction is algebra. 



3.2 Examples of Sets of the Operations Which Do Not Provide Generation 
of Algebra 

Similarly to Section 3.1, various sets U are considered with operations of addition, 
multiplication, and multiplication by the element from the field of real numbers 
introduced on them. 

1. The elements of the set U: 

1 .4 Images, defined on the set X with arbitrary range of values F of the dimension 
equal to the dimension of the set X, i.e. X,FcR" ; 

1.5 Images, defined on arbitrary set Xi with arbitrary range of values Fi of the 
dimension equal to the dimension of the set Xi, i.e. Xi,FicR", i=l,2,. . . 

2. Operations on the elements of a set (2.4, 2.5); addition, multiplication, and 
multiplication by an element from the field of real numbers. 

3. Physical meaning of the operations: 

3.4 (a) addition - total brightness of two images; (b) multiplication - (bl) global 
(non- pointwise) filter; (b2) definition of one image on a set defined by the other 
image (in the case when this operation is not determined (FctX), the value of the 
second operand is taken as a result of multiplication); (c) multiplication by an 
element from the field of real numbers - proportional increase or decrease in 
image brightness. 

3.5 (a) addition - total brightness of two images on intersection of the sets on which 
these images are given; in the points of a set X where only one image is 
determined, it is considered as a result of the operation; (b) multiplication: (bl) 
global (non-pointwise) filter; in the points of a set X, where only one image is 
determined (first or second operands), the image (first or second operands, 
respectively) is considered as a result of the operation; (c) the multiplication by 
the element from the field of real numbers - proportional increase or decrease in 
image brightness. 

Below, we give the examples of constructions, generated by the sets U with the 
specified characteristics (the types of set elements and operations together with their 
physical interpretation are specified in parentheses according to the list given above). 
These constructions are not algebras. Example (1.5, 2.5, 3.5) is omitted due to space 
limitations. 

Example 4 (1.4, 2.4, 3.4) 

Suppose that: 

• R is the field of real numbers; 

• I={(x,f(x)), xeX, f(x) eF} (X,FcR", neN); Ii={(x,a(x)), xeX, a(x)sFi}; 

l 2 ={(x,b(x)), xeX, b(x)eF 2 }, where Fi and F 2 are ranges of values of images Ii and 

I 2 , respectively, on the set X; 
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• the operation of addition of two elements from Fi,Fj cR" is introduced: i,j=l,2,..., 
Va(x)eFi, b(x)eFj:3! a(x)+b(x)eFk, i=l,2...,Fi; (zR"; this operation satisfies some 
conditions (V a(x)eFi, b(x)eFj c(x)eFy, i,j,y = 1,2,...): 
a(x)+(b(x)+c(x))=(a(x)+b(x))+c(x); 

a(x)+b(x)=b(x)+ a(x); 

Vi: V a(x)sFi, 3 OeFj: a(x)+0=a(x); 

Vi: V a(x)eFi, 3 (-a(x))eFi: a(x)+(-a(x))=0; 

• the operation of superposition of two elements from Fi,Fj is introduced: i,j=l,2,..., 
Va(x)eFi , b(x)eFj, at the points, where b(x)eX: 3! a(b(x))eFi; 

• the operation of multiplication by the element of the field R is introduced on the set 
F: VaeR, a(x)eF: 3!aa(x)eF; this operation satisfies some conditions (V a(x)sFi, 
b(x)eFj_c(x)eFy, i,j,y=l,2,..., Va,PeR): 

• (aa(x)+pb(x))c(x)=aa(x)c(x)+pb(x)c(x); a(Pa(x))=aPa(x); 

(a+P)a(x)=aa(x)+Pa(x); a(a(x)+b(x))=aa(x)+ ab(x). 

Let us introduce: 

• the operation of addition of two images Ii,l 2 : 

Ij+l 2 ={(x,a(x)+b(x)), xeX}; 

• the operation of multiplication of images Ii,l 2 : 

{{x,a{b{x))),b{x) G X 

Ii*i2 = ; 

[(x,6(x)),^(x) ^ X 

• the operation of multiplication of image I by an element of the field of real 
numbers aeR: 

al={(x,af(x)), xeX). 

The created construction is not algebra - it is an additive group, because the 
associativity of multiplication operation is not satisfied and all properties of an 
additive group are satisfied (definition of group is in [5]). 



4 Conclusion 

The examples of sets with various elements and the operations introduced on them are 
considered both belonging to algebras and not. For constructing the examples, the 
standard operations on images are used to show that DIA covers mathematical 
constructions of the standard IA[4] which satisfies the conditions of algebra[5]. The 
examples mentioned in this paper are the first step in ordering the operations 
introduced on various sets, generating and not generating DIA. Further it is planned to 
precisely formulate necessary and sufficient conditions of DIA generation by the sets 
of considered type. 

Our technique for formulating and checking the conditions for the set of operations 
that ensures the DIA construction is a basis for obtaining mathematically proven 
criterion for a choice of operations for producing efficient image analysis and 
recognition algorithmic schemes. This criterion is useful for practical implementation. 
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Abstract. The paper presents recent results in establishing existence conditions 
of a class of efficient algorithms for image recognition problem including the 
algorithm that correctly solves this problem. The proposed method for checking 
on satisfiability of these conditions is based on the new definition of image 
equivalence introduced for a special formulation of an image recognition 
problem. It is shown that the class of efficient algorithms based on estimate 
calculation contains the correct algorithm in its algebraic closure. The main 
result is an existence theorem. The obtained theoretical results will be applied to 
automation of lymphoid tumor diagnostics by the use of hematological 
specimens. 



1 Introduction 

During last several years in Scientific Council “Cybernetics” of the Russian Academy 
of Sciences the research was conducted in the field of development of mathematical 
techniques for image analysis and estimation. The theoretical base for this research is 
the Descriptive Theory of Image Analysis and Recognition [2-5,10,11]. This paper 
presents recent results in establishing existence conditions of a class of efficient 
algorithms for image recognition problem, including the algorithm, which solves this 
problem correctly. The proposed method for checking on satisfiability of these 
conditions is based on the new definition of image equivalence, introduced for the 
special formulation of an image recognition problem. It is shown that the class of 
efficient Algorithms Based on Estimate Calculation (AEC) [10,11] contains the 
correct algorithm in its algebraic closure. The main result is an existence theorem. 

One of the issues of the day in the field of image recognition is the search of the 
algorithm, which correctly classifies an image by using its description. The approach 
to image recognition, developed by the authors, is a specialization of the algebraic 
approach to recognition and classification problems (Yu. Zhuravlev [10,11]). The 
main idea of this approach consists in the following. There are no accurate 
mathematical models for such poorly formalizable areas as geology, biology, 
medicine, and sociology. However, in many cases nonrigorous methods based on the 
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heuristic considerations yield appreciable practical results. Therefore, it is enough to 
construct a family of such heuristic algorithms for solving appropriate problems, and 
then, to construct the algebraic closure of this family. The existence theorem is 
proved, which states that any problem from a set of problems concerning poorly 
formalized situations, is solvable in this algebraic closure [10]. 

An image recognition is a classical example of a problem with ill-formalized and 
partly contradictory information. This gives us good reasons to believe that the use of 
the algebraic approach to image recognition can lead to important results and, 
consequently, an “algebraization” of the field is the most promising way of 
development of desired mathematical techniques for image analysis and estimation. 

Note, that the idea of creating the unified algebraic theory covering different 
approaches and procedures used in image and signal processing has a certain 
background beginning with the works of von Neumann and extended by S. Unger, U. 
Grenander, M. Duff, Yu. Zhuravlev, G. Matheron, G. Ritter, J. Serra, et al. [1,7-11]. 
Our research is carried out in the field of the Descriptive Approach to Image Analysis 
and Recognition, which differs from the studies in the field of algebra mentioned 
above, and is completely original. 

Unfortunately, the algebraic approach developed by Yu. Zhuravlev cannot be 
applied directly to an image recognition problem. Mostly, it is due to the difficulty of 
an image as a recognition object and to considerable distinctions between the classical 
pattern recognition problem and the image recognition problem which consist in the 
following: 

• a standard object in the classical pattern recognition theory is, as a rule, described 
by a set of features; whereas there is no such natural way for image description 
that does not lose an important information about image; common methods of 
image description are either too complicated and require much computational 
recourses (e.g., raster representation of an image), or semantically primitive (set 
of features); 

• a single object (a scene) can correspond to several images differing in brightness, 
contrast, scale, and observer’s point of view; within the framework of a 
recognition problem, it means that different images of an object should be 
identically classified by the recognizing algorithm. 

Thus, the problem of image equivalence provokes much interest, especially the 
use of this property in pattern recognition problem [3-5]. 



2 Image Equivalence 

Image equivalence relation on the set of images may be introduced in different ways: 

a) we may consider equivalence as a closeness of image descriptions with respect to 
a metric in a metric space, for example, a metric in the Euclidean space £", where 
image is described by its n-dimensional feature vector; 

b) if the set of allowable image transformations is given in the image recognition 
problem (for example, image rotations by angle 2nk/n, k=0,l,...,n-l), the two 
images are considered to be equivalent, if the first image is obtained from the 
second one by applying a certain transformation from the given set of allowable 
transformations. 
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In this work, we propose a definition of image equivalence, based on a special 
setup of an image recognition problem. Consider a set of allowable images, described 
by their n-dimensional feature vectors, and the recognizing algorithm A, which 
constructs /-dimensional information vector by using «-dimensional description 
vector. We remind, that information vector is a vector of object’s membership of 
classes, where the values of vector components 0,1, A mean “object does not belong to 
a class”, “object belongs to a class”, “algorithm fails to determine, whether object 
belongs to a class or not”, respectively [10]. 

Definition 1. Two images are equivalent with respect to a recognizing algorithm A, if 
their information vectors, obtained by the recognizing algorithm A, coincide. 

A simple way of constructing an image equivalence class is introduced, based on 
this definition. The idea consists in applying a certain set of transforms to generating 
images. 

Let a binary image / of a plane polygon be given. We call it generating image. Let 
the transforms of plane rotation group C„ be given as a set of transforms: each n-order 
group Cn consists of all rotations by angles 2ttk/n, k=0,I,...,n-l, around a fixed point, 
and it is essentially, that given transforms form a group. Applying each transform 
from C„ to generating image /, we obtain a set of images. 

The equivalence of the obtained images is established by the identity of their 
information vectors. Here, it is reasonable to describe images by vectors of invariant 
features and the simplest way is to exploit the invariants with respect to the given 
group of transformations C„.. As a result, all images from the obtained set have the 
same feature vectors, and recognizing algorithm constructs the same information 
vectors for these images. Consequently, all obtained images are equivalent. It is 
essential that, in case when transformations form a group, mathematical methods are 
developed for constructing invariants with respect to this group of transforms [6]. 



3 Mathematical Formulation of a Recognition Problem 

In order to prove the existence theorem for AEC that correctly solves the image 
recognition problem, it is necessary to introduce a new formulation of an image 
recognition problem, differing either from classical formulation [10,11], or from the 
formulation of the Descriptive Approach [2]. Let us have a look at these formulations. 

3.1 Classical Mathematical Formulation of Pattern Recognition Problem Z 

Z(I(j,Si,...,Sq,Pi,...,Pi) is a pattern recognition problem, where Ig is allowable initial 
information, Si,...,Sq is a set of allowable objects, described by feature vectors, 
Ki,...,Ki is a set of classes. Pi,..., Pi is a set of predicates on allowable objects, 
Pi=Pi(S), i=l,2,...,l. The problem Z is to find the values of predicates Pj ...,Pi. 
Definition 2. Algorithm is correct for a problem Z [10], if the following equation is 
satisfied: 

A(/,A,,...,A,,P,,...,/])=||a,,||^^^,where«,..=P,(5,). 
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3.2 Mathematical Formulation of Image Recognition Problem 



Taking into account the introduced notion of image equivalence an image recognition 
problem may be formulated in the following way: 



2' cr • , • Kf , . " 



IS an image 



recognition problem Z\ where j-are images, i=l,2,...,q, ji is a number of an 

image within the i-th equivalence class, p, is a quantity of images in the i-th 
equivalence class, y, =7, 2,. M,=///,/f,...,7/’7, i=l, 2,...,q, is an equivalence classes 

on the set !■; K],K 2 ,...,Ki are classes in the image recognition problem; 



P^'^‘ : "//' e ", t=l,2,...,l, i=l,2,...,q, ji=],2,...,pi, are predicates. The problem Z‘ 
is to find the values of predicates . 



3.3 Mathematical Formulation of Image Recognition Problem 



The distinction between the problem and the problem Z’ is that each equivalence 
class is replaced by a single image, a representative of a class, with number n, , 7< n, < 
Pi„ where i is a number of an equivalence class. This replacement is realized with a 
help of a definition of an allowable transform. 



Definition 3. An arbitrary transform/: {I/} {I/} is an allowable transform, if/(/,^) 

and 7/ belong to the same equivalence class for each 7, 



Z" 








is an image recognition problem Z^, 



where 7^. ‘ , i=l,2,..., q are images, 7. G 7W . ; Kj,K 2 ,...,Kt, are classes in the image 

recognition problem; P‘ "!■ ‘ E K^" , t=l,2,...,l, i=l,2,...,q, are predicates. The 
problem is to find the values of predicates P,‘. 



4 Conditions of Completeness of the Class of Algorithms Based on 
Estimate Calculation for Image Recognition Problem 

The main result of the paper is obtained for a class of efficient recognizing algorithms 
- AEC [10,1 1]. These algorithms are based on formalization of the idea of precedence 
or partial precedence: the “proximity” between parts of descriptions of the objects 
classified previously and object to be classified is analyzed. Suppose we have the 
standard descriptions of the objects |5|, Ss K. and {5'^}, S'i: K . , and the method of 

the “closeness” evaluating for parts of the description of S and the corresponding parts 
of descriptions of S is an object presented for recognition, y=7,2, ...,/. 
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By evaluating the “proximity” between the parts of descriptions of 

and between /(iS) and 1 (5^), respectively, it is possible to evaluate a 
generalized “proximity” between S and the sets of objects |>S|, {5^} (in the simplest 

case, the value of a generalized “proximity” is a sum of values of “proximity” 
between the parts of descriptions). Then, the total estimate for an object of a class is 
formed by the set of estimates, which is the value of object’s membership function of 
a class. 

The existence theorem for AEC that solves recognition problem Z correctly is 
proved for the algebraic closure of the class of AEC. 

Theorem 1 [10]. Let the conditions of non-identity of descriptions of classes and of 
objects in pattern recognition problem Z with given vectors of features be satisfied. 
Then the algebraic closure of the class of AEC-algorithms is correct for Z. 

Note, that the definition of image equivalence is not used in the classical 
formulation of a recognition problem, therefore, the Theorem 1 of a correct algorithm 
existence cannot be applied directly to an image recognition problem. 

The distinction between the problem Z and the problem Z‘ is that, in the latter, the 
image equivalence classes are explicitly considered. In order to reduce the image 
recognition problem Z^ to the standard recognition problem Z, it is necessary to 
proceed from classification of a group of objects to classification of a single object. 
The problem Z^, differing from Z‘ by the presence of allowable transforms and by the 
lack of image equivalence classes, allows us to operate with a single image - a 
representative of a corresponding equivalence class - for each equivalence class under 
certain restrictions on the set of allowable transforms. 

Direct generalization of Theorem 1 for the image recognition problem Z^ is 
Theorem 2. 

Theorem 2. Let the allowable transforms t/),/ 2 ,...} form a transitive group. Then, the 
image recognition problem Z^ may be reduced to the problem Z^ and the algebraic 
closure of the class of AEC-algorithms is correct for Z^. 

Theorem 2 establishes the conditions of existence of a correct algorithm for image 
recognition problem and proves that such algorithm can be found in the algebraic 
closure of AEC-algorithms. 



5 Conclusion 

The task of searching for the correct algorithm for image recognition problem was 
investigated. The definition of image equivalence was introduced, and the formulation 
of image recognition problem was modified. It was proved that, under certain 
restrictions on the image transforms, an image recognition problem may be reduced; 
and the correct algorithm for the reduced problem can be found in the algebraic 
closure of AEC. 

The future research will be devoted to detailed analysis of image equivalence and 
establishment of relationship between image equivalence and image invariance. The 
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obtained theoretical results will be applied to automation of lymphoid tumor 
diagnostics by using hematological specimens. 
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Abstract. In this paper we introduce a new method for recognizing and classi- 
fying images based on concepts derived from Logical Combinatorial Pattern 
Recognition (LCPR). The concept of Typical Segment Descriptor (TSD) is in- 
troduced, and algorithms are presented to compute TSDs sets from several 
chain code representations, like the Freeman chain code, the first differences 
chain code, and the vertex chain code. The typical segment descriptors of a 
shape are invariant to changes in the starting point, translations and rotations, 
and can be used for partial occlusion detection. We show several results of 
shape description problems pointing out the reduction in the length of the 
description achieved. 



1 Introduction 

Recognition of 2D objects is an important task useful in many machine vision appli- 
cations and research areas such as robotics and computer vision [1, 2]. A 2D shape is 
a feature often used for its distinctive classification power. A shape is what remains of 
a region after disregarding its size, position and orientation in the plane [3]. Non-nu- 
meric shape description methods search representations (e.g. a chain code, a graph) of 
the original shape so that only important characteristics are preserved. Other shape 
description techniques generate numerical descriptors given as feature vectors. 

The required properties of a shape description scheme are invariance to translation, 
rotation an scaling. Shape matching or recognition refers to methods for comparing 
shapes. Usually, given a group of known objects, the identical or most similar objects 
in a scene must be found. There are many imaging applications where scene analysis 
can be reduced to the analysis of shapes [4], though effectively representing shapes 
remains one of the biggest hurdles to overcome in the field of automated recognition. 
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In this paper we introduce a new boundary-based method for recognizing and 
classifying images originated from concepts of Logical Combinatorial Pattern Re- 
cognition (LCPR). The concept of Typical Segment Descriptor (TSD) is introduced. 
Using a shape’s Freeman chain code its TSDs are computed, so they inherit the data 
reduction property of this representation. An algorithm for computing the TSD set 
from closed shapes is given. 

Conversion (mapping) of an analog image onto a discrete one (digitization) is based 
on several assumptions [5]. It is assumed that the acquisition of an image is done 
using a set of physical captors, which could be modeled by a set of subsets of the 
continuous plane. The simplest idea is to assume a discrete partition of the plane. If 
only partitions involving regular polygons are considered, the number of different 
partitions is reduced to three: triangles, squares, or hexagons. The selection of the 
type of partition determines differences in concepts like neighborhood, adjacency, and 
connectivity. In this work we assume that partition is in regular squares and the algo- 
rithms presented assume closed boundary shapes. 

Recently, machine learning and symbolic processing tools have been extended to 
Image Processing problems. New image representation concepts have been developed 
[6, 7]. The new method presented here uses ideas from LCPR. 

Chain codes are frequently used for image representation since they allow consi- 
derable data reduction. The first approach for representing digital curves was intro- 
duced by Freeman [8]. By means of this representation several properties of arbitrary 
planar curves can be determined: moments, inertial axes, etc. [9]. Curves are encoded 
as line segments that link points of a rectangular grid. These points are the grid points 
closer to the curve. This process is called chain encoding. 

Many authors have used chain coding for shape representation [2, 10], a normali- 
zation of the code with respect to the starting point is achieved by using shape num- 
bers [3]. In [11] contours of handwritten characters are chain coded and recognition 
cost and accuracy are reported. 

A new chain code for shapes composed of regular cells is defined in [12]. It is called 
Vertex Chain Code (VCC), and is based on the number of cell vertices that are in 
touch with the bounding contour of the shape. Concepts of VCC are extended for 
representing 3D shapes in [13], producing a curve descriptor invariant to translation. 
Concepts from LCPR are used for image identification in [14], where a method for 
solving supervised pattern recognition problems using binary descriptors is reported. 
A generalization can be achieved by transforming numerical descriptors into k-valued 
sets so that k-valued logic tools can be used. 

If some features of an object take values that cannot be found in the descriptions of 
objects of the remaining classes, then such a sequence of values is called a descriptor. 
If a certain descriptor loses this property when a feature is not included, then it is 
called a Non-Reducible Descriptor (NRD) [15]. 

In [16] an algorithm (KORA) is reported to select the features that form a minimal 
descriptor of every object in a database of descriptors. It has been extended and used 
on non-image-like data [17, 18] and is used in [14] for recognition of objects in raw 
images. In this latter work the concept of sub-description of an object is transformed 
in a fragment of an image. Considering each image as a one-dimensional array a 
learning matrix is formed. Differences of corresponding pixels are used to conform a 
dissimilarity matrix. In this manner, the concept of feature is lost and the set of co- 
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lumns selected as descriptive attributes change if transformations such as rotations, 
translations or scaling are applied. 

NRDs are reported as a concept for representing a minimal set of characteristics that 
can be used for object recognition, the TSD proposed here complies with the same 
objective but, instead of using the whole learning matrix, only the chain code is used. 
This paper is organized as follows: in Section 2 TSDs are defined and their main 
properties discussed. Section 3 presents results, and in Section 4 conclusions are 
drawn. 



2 Typical Segment Descriptors 

Suppose that a set of segmented binary 2D-images (shapes) X^, is known. For 

each shape, the chain code is computed based on 5-connectivity, 5=4,8. Each 5-chain 
has a. first difference {derivative) D(A'^), D(2f2),. . .,D(XJ associated. The derivative of a 
shape is a sequence of codes representing changes of direction. The length of the 
derivative D(2Q will be denoted |D(X)|. It is assumed that objects can be rotated in 
360 

angles k * , with k integer, without change in the derivative. 

6 

Definition. A sequence yiy 2 ---yp is a p-segment of a circular sequence of codes 
D(X)=X]X2...x„ if p<m and for some fixed j=l,...,m it is observed that y,=Xj+, with 
t=l,...,p. p is the length of the p-segment. For example 222310 is a 6-segment of 
112223100223312. 

Definition. Given two segments 0 and S of length p and q respectively, 0 is a sub- 
segment of S, if p<q, and for a certain circular index rotation yt=x,, t=l,. . .,p. It is easy 
to observe that 0=121 12 is a suh-segment of 0=1 12223100223312. 

Definition. A p-segment 0 of the derivative D(X) is a segment descriptor of image X 
with respect to image Y, if there does not exist a p-segment S of D(T) such that S=0. 
If, from 0 it is not possible to eliminate either the first element or the last one while 
keeping the property of segment-descriptor, then 0 is a typical segment descriptor. It 
will be denoted as tsd{X/Y) or simply tsd. A tsd of minimal length is called minimal 
segment descriptor. In general, it is not unique. 

Figure 1 shows two shapes, the starting point from which their boundary was tra- 
versed, the path of their 4-chain codes which are given as shape numbers, and corres- 
ponding first differences. 



1 

t 




C(;f)=22212n 1001003330332 C(r)=32222110101100033332 

D(jq=00031300301303001303 0(10=13000303131030030003 



Fig. 1. Two shapes, their 4-chain codes (C) and first derivatives (D) 
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It can be verified that the 0=30300 is a 5-segment descriptor of the shape X with res- 
pect to y, while 5=3003 is not. 0 is not a typical segment-descriptor because 3030 (a 
subsegment) is a segment descriptor of X with respect to Y. 

The previous definitions do not depend on the concept of the derivative. That is, they 
can be applied to chain codes or other representations directly. 

The set of all typical segment descriptors of X with respect to Y will be denoted as 
TSD(A7F). A tsd can be present only in X, (but not in A, (i^(j) ), so it differentiates A, 
with respect to the other shapes. The set of fsds that fulfill this quality will be denoted 
TSD(A,). 

The properties of a shape’s TSD set include: 

Property 1. From the condition Xf^Xj, i^^j, i,j=l,...,n, it follows that each shape has at 
least one typical segment descriptor. This is obvious because in the worst case, for 
each A„ a subchain of D(A,) of length |D(A,)|-1 is a tsd. 

Property 2. Different tsd?, have different discriminative power, since they can dis- 
criminate with respect to a different number of shapes. Based on this, a weight W(0) 
can be associated with each tsd 0, being proportional to the number of discriminated 
shapes. 

Property 3. Each tsd 0gTSD(A/F) is linked to a unique subsequence in its original 
chain code which corresponds to a differentiating characteristic of A. Therefore, two 
or more occurrences of 0 inside TSD(A/F) can be associated to the appearance of this 
characteristic with different or equal starting directions. 

To highlight the differences between two occurrences of the same 0 in TSD(A/F), we 
adopt the following conventions: 

• 0° : the first code of the subsequence of A that originates 0 is 0. 

• 0* : the first code of the subsequence of A that originates 0 is 1. 

• 

• 0®'^ : the first code of the subsequence of A that originates 0 is 5-1. 

Observe that, though a tsd 0 can appear as 0“^ (de {0,1,..., 5-1}) in TSD(A/T), the 
same tsd can appear as 0^ (fi^d) in other object having the same shape. In Figure 1,01 
is a tsd of A with respect to Y and the subsequence 330 originates it. That means that 
the four subsequences: 001, 112, 223, and 330, never can be present in Y. Figure 2 
shows a graphical representation of these subsequences. 

Note that not necessarily all of these sequences are simultaneously present in the 
shape, but they can eventually appear depending on the shape pose. In Figure 1, 001 
and 330 are present, but not 1 12 nor 223. 

-■ 1 

OOl 112 




Fig. 2. Subsequences that can originate tsd 01 from TSD(AA) 



The following holds for the shapes in Figure 1 : 

TSD(A/F)={01^ 01°, 3030‘, 1303^ 3030°, 13003*) 
TSD(T/A)={ 10°, 131°, 0303^ 3031^ 00300°, 13000^ 00030^} 
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2.1 TSD Rotation Invariance 

Property 4. Compatibility of TSD(A/F): Let TSD(A/K)={0'‘,0^. } , d,<5. For 
any rotation of X with magnitude 5j6{0, 1, 5}, all 0^ in D(X) are in the 

form©!’®''*' for i=l,...,q, where 0g represents the sum mod 5. This property states 
that each TSD(A7F) has associated other 5-1 sets that contain the results of rotating 
the original set, and that it is possible to predict in what form every tsd will appear in 
each set. So, if shape X is rotated by 5j, there will he a corresponding set TSD(A/F) 
with a known form for every tsd. 

Definition. Compatibility of X with respect to Y, Comp(A/F), is the set of all possible 
n-tuples of dj in all possible sets {0|‘,0j" ,...,0^' } , these sets can be obtained by rota- 
ting the original. A similar definition could be formulated for TSD(X). In Figure 1 it 
holds that the set of tsds {1303^ 3030°, 13003^} is not a compatible set. However, 
{ 1303°, 3030', 13003'} is compatible. 



2.2 Partial Occlusion Detection Using TSD 

Definition. The number of occurrences of a tsd 0‘'eTSD(A71O is denoted the /re - 
quency of 0'*, ttg. The frequency of 0 is a 5-tuple, Freq(0)=(ao,OCi,...,a5_i). is the 

i=0 

absolute frequency of© in TSD(A/T) and will be denoted as ||0|| . 

Another important attribute is the relative order in which the tsds appears. 0 is the 
predecessor of S (0=Pre(5)) if there is no other tsd in between. Then 5 is the 
successor of 0 (5=Suc(0)). In Figure 1, 3030' is the predecessor of 01^. 

Definition. A sequence of tsd in TSD(A/T) is connected if they, all in sequence, form 
a subchain of the original chain code. The connectivity ofX, denoted as Con(A), is the 
set of all connected sequences in TSD(A/F). Each single tsd is a member of Con(X). 
In Figure 1, 3030', 01^, 1303^ and 3030° are connected. 

Note that attributes of each 0 (tsd of X) such ||0|| , Pre(0), Suc(0), as well as features 

of X such Comp(X) and Con(X) can be used for detecting the presence of the shape in 
a scene, even in case of partial occlusion. If a tsd of the shape if not detected due to an 
occlusion, but some of its attributes are checked, as well as other properties of the 
shape, then some certainty about the presence of the shape could be calculated. This is 
subject of our current research. 



2.3 An Algorithm for the Computation of TSD(X/Y) 

In order to determine the set of all possible tsds of a shape with respect to other 
shapes, TSD(A/F), two situations can be considered: each shape constitutes a class; 
each class is formed by more than one shape. In this latter case, the procedure is the 
same but only segments from different classes have to be compared [17]. 
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Let D(X) and D(F) be two derivatives. Each p-segment of X is tested as being a tsd. 
The set of tested segments are denoted Rev(X). The algorithm is: 



Step 1. Let p = 1, TSD{X/Y)=0, Rev(X)= 0. 

Step 2. Let Sp be the next p-segment that can be extracted from D(X), i.e, a p-segment 
formed from an incremental starting position in D(X). 

Step 3. If SpG Rev(X) go to step 9. 

Step 4. If SpG TSD(A/F) go to step 9. 

Step 5. If some 0g TSD(A/T) is a subsegment of Sp, go to step 8. 

Step 6. If there is in D(T) a p-segment Ep such that Ep=Sp go to step 8. 

Step 7. Add Sp to TSD(A/F), go to step 9. 

Step 8. Add Sp to Rev(X). 

Step 9. If it is not possible to extract another p-segment from D(A), make p=pH-l. 

Step 10. If p=|D(X)| end, else go to step 2. 

The design of an algorithm for computation of the set TSD(2Q is not complex. It is 
only necessary to find the intersection of all sets of tsds of X with respect to the other 
shapes. The identification of X in an image will be easy if the shapes are isolated in 
the scene, finding at least one tsd from this intersection is sufficient to verify that 
shape X appears in the scene. 

Additional steps should be added if the intersection is empty. Several alternatives can 
be used in these circumstances. A simple choice is to build a set taking one tsd from 
each set of tsds of X with respect to the other shapes. In order to verify the presence of 
X in an image it will be necessary to find all the elements of the set created in this 
manner. In our experiments we build this set selecting tsds with bigger weights. 



(S 




Q 

D 



Fig. 3. Shape contours of letters A, B, C, D 



3 Results 

Figure 3 shows contours of some letters used in our experiments. Tsds that differen- 
tiate B from the remaining shapes where computed using the proposed algorithm. 
Results are shown in Table 1. Note that the sum of the lengths of tsds in TSD(B/X), 
with X={A,C,D}, is less than the length of C(B), so they describe shape B with res- 
pect to the others in a more compact form. Even in cases where the sum of tsds is 
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bigger than the length of C(B) it will not be necessary, in general, to compare the 
chain code or the derivatives with respect to all the tsds. 



Table 1. Sets of typical segment descriptors of shape B with respect to the other shapes. 



TSD(B/A): 


{33^ 


1000°, 


000 1", 


1313", 


3131", 00131", 13100", 00000‘, 310013"} 


TSD(B/C): 


to 


31001^ 


10013“, 


lOOOOF, 


OOOIOOO", 0000000*, 00000010‘| 


TSD(B/D): 


{33^ 


1313°, 


lOOl", 


00131", 


lOOOOT, OOOOIOOOO"} 



Table 2 shows the sets of TSD(X), X={A,B,D}. In case of C the intersection of its sets 
of tsds with respect to the others letters was empty, so we use the alternative des- 
cribed in 2.3. Figure 4 illustrates the set of tsds obtained for C and their compatibility 
with rotations in direction r. 



Table 2. TSD sets calculated for A, B and D. 



TSD(A/{B,C,D}) 


TSD(B/{A,C,D|) 


TSD(D/{A,B,C|) 


{ll'l 


{33^1 


{ 01310"} 



Note that the presence of the shape can be verified finding only one tsd. This property 
is useful when the shapes are isolated in the scene. Other properties of the tsds can be 
used if noise or occlusions affect the boundary: their absolute frequency, order of 
appearance, connectivity, etc. 




1 “ 




Fig. 4. TSD(C) elements and compatibility with rotations with magnitude r 



4 Conclusions 

A new method for description and identification of objects has been introduced. The 
concept of Typical Segment Descriptor is defined and its properties are enumerated. 
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The invariance of TSD to changes in starting point, translation, rotation and scaling, 
and its usefulness for partial occlusion detection are explained. The rotation applied to 
a known shape in the scene can be detected using the compatibility of the TSD. 
Advantages of using TSD instead of chain codes are verified through examples with 
sensible reduction in the length of the description that will be used during identi- 
fication. 

Algorithms for the computation of the typical segment descriptors of one shape with 
respect to another, and to all shapes of a different class, are proposed. They can be 
used when the boundary is encoded using the Freeman’s chain code or any other 
chain such as the Vertex chain code. 

The efficiency of using the TSD approach for shape identification has been shown in 
synthetic scenes, obtaining encouraging results. 

Suggestions for further work include to extend the use of typical segment descriptors 
to segmentation techniques, to study other properties that could be useful for detection 
of changes in scale. 
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Abstract. In this paper, we introduce a new approach to selecting the best hyper- 
plane classifier (BHC) from the optimal pairwise linear classifier is given. We first 
propose a procedure for selecting the BHC, and analyze the conditions in which 
the BHC is selected. In one of the cases, it is formally shown that the BHC and 
Fisher’s classifier (FC) are coincident. The empirical and graphical analysis on 
synthetic data and real-life datasets from the UCI machine learning repository, 
which involves the optimal quadratic classifier, the BHC, the optimal pairwise 
linear classifier, and FC. 



1 Introduction 

Linear classifiers have been extensively studied because of their classification speed and 
their simplicity in the implementation. We consider two classes, ci and C2, which are rep- 
resented by two normally distributed d-dimensional random vectors, xi ^ Si) 

and X2 ~ S2). Thus, the statistical information about the classes is determined 

by the mean vectors, and /I2, and the covariance matrices, Si and S2- We assume 
that these parameters are already known, or estimated by using a conventional estimation 
method, such as the maximum likelihood estimate (MLE), the Bayesian estimate [ 4 , 14 ], 
etc. We also assume that the a priori probabilities of the two classes are equal. When 
dealing with two normally distributed random vectors, the general form of the optimal 
Bayesian classifier is quadratic. In special cases, the quadratic function can be factored 
as a product of two linear functions, as follows: 



Cl 

5i(x)ff2(x)^0, 



C2 



( 1 ) 



where 51 (x) = w‘x -F wi and (?2(x) = w|x + W2- 

This is possible when the necessary and sufficient conditions hold [ 11 , 12 ]. Although 
( 1 ) is optimal, and it achieves high classification accuracy, it requires two linear algebraic 
operations to classify a single object. We will see later in this paper that using the best 
of these two hyperplanes leads to nearly optimal classification. 

* Member, IEEE. Partially supported by NSERC, the Natural Science and Engineering Research 
Council of Canada. 
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Various schemes that yield linear classifiers have been reported in the literature, 
including Fisher’s classifier [4,6,13], the perceptron algorithm (the basis of the back 
propagation neural network learning algorithms) [9],, piecewise recognition models [7], 
random search optunization [8], removal classification structures [1], adaptive linear 
dimensionality reduction [5] (which outperforms Fisher’s classifier for some data sets), 
linear constrained distance-based classifier analysis [3] (an improvement to Fisher’s 
approach designed for hyperspectral image classification), and recursive Fisher’s dis- 
criminant [2], 

Rueda and Oommen [11,12] have recently shown that the optimal classifier between 
two normally distributed classes can be linear even when the covariance matrices are not 
equal. They showed that although the optimal classifier for normally distributed random 
vectors is a second-degree polynomial, this polynomial degenerates to be either a single 
hyperplane or a pair of hyperplanes. In this paper, we introduce a novel approach to 
selecting the best hyperplane classifier (BHC) in the framework of optimal pairwise 
linear classifiers. 

2 Optimal Pairwise Linear Classifiers 

Letxi ^ andx 2 ^ iV(/r 2 , ^ 2 ) be two normally distributed random vectors. 

The three cases and the conditions in which the optimal classifier is a pair of hyperplanes 
are listed below. 

Case I: Suppose that' 

Ml = -M 2 = , Md]‘, Si = I, and S 2 = diag(a]'', . . . , . (2) 

The optimal classifier is a pair of hyperplanes if and only if any of the following 
conditions is satisfied. 

(0 0 < Oi < 1, Qj > 1, Cfc = 1, Pk = 0, for all fc = 1, . . . j, 

k i, k j, with 

afil - aj)pi + 0^(1 - ai)p^j - -{oiOj - a— aj + l)log (3) 

(oiOj) = 0 .. 

(ii) tti ^ 1, ttj = 1, pj = 0, for all j i .. 

{Hi) Qi = 1, for alH = 1, . . . , d . 

When d = 2, and the parameters have the form of: 

Ml = -M 2 = [C s]*> Si = I, 1:2 = diag(a"\ 6"') , (4) 

the condition of (4) is instead: 

a(l — 6)r^ + b{l — a)s^ — — a — 6 + 1) log(a6) = 0 , (5) 

* In this paper, diag(ai , . . . , ad) represents a d x d diagonal matrix, whose diagonal elements 
are ai , . . . ,ad respectively. 



A New Approach That Selects a Single Hyperplane 523 



Case II: Suppose that 

Ml 1 l^ii • • • 1 f^j 7 ■ ■ ■ ; l^d] 1 

M2 “ [Mi? ■ ■ ■ 7 M*— 17 • 7 Mj— 17 Mj 7 Mi+i 7 ■ • ■ 7 Me/] (^) 

Si = diag(a^\... ,a”^ ... ,a”\... and 

172 = diag(a;f\... ,a"^.. ,a,"\... (7) 

The optimal classifier is a pair of hyperplanes if and only if isf = 

When d = 2, and the parameters are of the form: 

Ml = -M 2 = h ^1 = diag(a"\ 6"^), and 172 = diag(6"\ a"^) , (8) 

the necessary and sufficient condition is = s^. 

Case III: Suppose that the covariance matrices have the form of (7), and = fi 2 - 
Then, the classifier is always a pair of hyperplanes. 



3 Selecting the Best Hyperplane 

First of all, we introduce the following definition, which will be fundamental in the 
criteria for selecting the BHC. 

Definition 1. Let ^(x) be the value resulting from classifying a vector x. The sign of 
g(x), sgn{g, x), is defined as follows: 

{ -1 ifg{^) < 0 

sgn{g,x) = / 0 ifg{x) = 0 (9) 

[ 1 ifgi^) > 0 

In other words, a new sample falls in the “negative” side, sgn(g,x) = —1, or in 
the “positive” side, sgn(p,x) = 1. Ties are resolved arbitrarily, where we assign 0 to 
sgn(g, x) . The criteria for selecting the BHC is based on the result of classifying the two 
means, and uses Definition 1 to evaluate the sign resulting from the classification. 

Rule 1 Letxi ~ N{pi, Si) andTi .2 ^ iV(/X 2 , . 572 ) be two normally distributed random 
vectors, and gi{x)g 2 {'x.) be the optimal pairwise linear classifier. The BHC is selected 
as per the following rule. 

Select: 

• gi, if sgn{gi, p.i) f sgn{gi,iif), 

• gi, ifsgn{g 2 , Bi) ^ sgn{g 2 , B 2 )’ or 

• gi and g 2 , ifsgn{gi,Hi) = sgn{gi, p.^) = 0. □ 

In other words, the BHC is the hyperplane that separates the space into two regions 
when the mean vectors are different. One region contains pi and the other contains p 2 - 
When the mean vectors are coincident, both gi and 52 are the best classifiers, and hence 
both must be selected. 

We now analyze the conditions for selecting the BHC for the case in which the 
covariance matrices are the identity and a diagonal matrix respectively (Case I). The 
formal proof of the result can be found in [10] 
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Theorem 1. Let xi ^ S\) and X 2 ^ 7 V(/X 2 , S 2 ) be two normally distributed 

random vectors, where the parameters have the form of (4), and 

= a(a — l)a; + a(a + l)r — j3s , and (10) 

H- = a(l — a)x — a(a + 1)^ — j3s , (11) 



be the linear functions ( in their explicit form) composing the optimal pairwise linear 

/ l-b 



classifier, where ol = and (3 = 

The linear function gi(x) is selected as per Rule 1, if: 



K £ {-r, r) when giini) > 0 and gi{pt 2 ) < Oj (12) 

K € (r, -r) when gi{fii) < Oand gi(fJ-2) > 0, (13) 



where k = S\ 

Conversely, g 2 (x) is selected when k is outside the intervals. 



The extension of Theorem 1 to d-dimensional normally distributed random vectors, 
where d > 2, is straightforward. The conditions for which the BHC is selected are similar 
to those of the two-dimensional case. The formal proof for the result can be found in 
[ 10 ] 

We now analyze another case (Case II) in which the mean vectors are in opposite 
directions and the diagonal covariance matrices have the two elements of their diagonal 
switched. 



Theorem 2. Let xi ^ S\) and X 2 ^ 7 V(/X 2 , S 2 ) be two normally distributed 

random vectors whose parameters are of the form of (8). The BHC is always: 



(71 (x) = X + y = 0 ifr = s, and 
= X — y = 0 ifr = —s .. 



(14) 

(15) 



The formal proof of this theorem can be found in [ 1 0] . The extension to d-dimensional 
normally distributed random vectors, where d > 2, can be derived by replicating the 
steps of the proof of Theorem 2, and substituting r and s for pi and pj respectively. The 
formalization of the result is stated and proved in [10]. 

We now show that for the case discussed above, i.e. when the two distributions have 
mean vectors of the form of (6), and covariance matrices of the form of (7), the BHC 
is identical to Fisher’s classifier. In the theorem below [10], we show the result for 
d-dimensional normally distributed random vectors, where d > 2. 

Theorem 3. Let xi ~ Si) and X 2 ^ 7 V(/X 2 , S 2 ) be two normally distributed 

random vectors whose mean vectors and covariance matrices have the form of (6) and 
(7) respectively. The BHC is identical to Fisher’s classifier. 

The third case that we consider is when we deal with two normally distributed 
random vectors whose covariance matrices have the form of (7), and their mean vectors 
are coincident. This case is the generalized Minsky’s paradox for the perceptron. The 
result for two-dimensional normally distributed random vectors is stated as follows, and 
the proof is available in [10]. 



A New Approach That Selects a Single Hyperplane 525 



Theorem 4. Let xi ^ S\) and X 2 ^ 7V(/X2, S 2 ) be two normally distributed 

random vectors whose covariance matrices have the form of (8), and whose mean vectors 
have the form fii = fi 2 - The BHC is composed of two linear functions: 

gi(x) = -X - y + {r + s), and 52 (x) = x - y + {s - r) .. (16) 

The generalization of the result above for d- dimensional normally distributed random 
vectors, where d > 2, follows the same steps of the proof of Theorem 4. The details 
of the proof are found in [10]. The result of Theorem 4 is quite useful in deciding 
which linear function should be selected as the BHC, a single hyperplane or the pair of 
hyperplane composing the optimal pairwise linear classifier. Indeed, the case in which 
the distributions have coincident means rarely occurs in real-life scenarios. 

The extension of the BHC classifier for more than two classes is straightforward. It 
can be achieved by deriving the BHC for each pair of classes. Then, the classification 
is performed by using the Voronoi diagram constructed using all the “inter-class” BHC 
classifiers. How this framework works in real-life scenarios is a problem that we are 
currently investigating. 

4 Classification Accuracy and Speed 

In order to test the accuracy and speed of the BHC and other two linear classifiers, we have 
performed some simulations for the different cases discussed in Section 3. We chose the 
dimensions d = 2 and d = 3 and trained our classifier using 100 randomly generated 
training samples, each sample represented by a two or three dimensional vector. For 
each case, we considered two classes, ci and C 2 , which are represented by two normally 
distributed random vectors, xi ~ and X 2 ^ 2 ) respectively, where I 

is the identity. 

The first case that we analyze consists of two examples that instantiate two-dimensi- 
onal normally distributed random vectors, 2DD-1 and 2DD-2, whose mean vectors and 
covariance matrices satisfy the conditions of (4). The parameters are = —p 2 ~ 
[0.747, 1.914]*, S 2 ~ diag(0.438, 5.827), ~ [-1-322, -1.034]*, and S 2 ~ 

diag(2. 126, 0.205) respectively. 

The second case tested in our simulations is an example of two three-dimensional 
normally distributed random vectors, 3DD-1, whose covariance matrices and mean vec- 
tors, which satisfy the constraints of (2), are = —p 2 ~ [0.855,1.776,0]* and 
S 2 tv diag(0. 562, 3.842,1). 

In each case, the OPLC was obtained using the methods described in [11,12]. The 
BHC was obtained by invoking Rule 1 introduced in Section 3, and Fisher’s classifier 
(FC) was obtained using the method described in [4]. 

To test the classifiers, we then generated ten sets each containing 100 random samples 
for each class using the original parameters. The results obtained after testing the three 
classifiers are shown in Table 1 . The classification accuracy was computed as the average 
of the percentage of testing samples that were correctly classified for each of the ten 
data sets. Besides for each individual data set, the average between the classification 
accuracies for classes c\ and C 2 was computed. The classification speed represents the 
average number of CPU seconds taken to classify 100 testing samples. 
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Table 1. Classification accuracy and speed obtained after testing three linear classifiers, OPLC, 
BHC and FC, on randomly generated data sets. 



Example OPLC BHC FC 

Accuracy Speed Accuracy Speed Accuracy Speed 



2DD-1 


92.15 


4.48 


91.85 


1.75 


91.15 


1.80 


2DD-2 


96.20 


4.38 


96.15 


1.73 


95.15 


1.79 


3DD-1 


93.40 


4.57 


93.30 


2.11 


92.35 


1.82 



For the first two examples, 2DD- 1 and 2DD-2, the classification accuracy of the BHC 
is very close to that of the OPLC, and higher than that of FC. For the three-dimensional 
example, 3DD-1, we again observe the superiority of the BHC over FC. We also see that 
the BHC attains nearly optimal classification - just 0.1% less than the optimal classifier, 
OPLC. The BHC and FC are more than twice as fast as the OPLC, and both the BHC 
and FC achieve comparable speed rates. In Figure 1 the BHC, the OPLC, FC, and the 
samples of one of the testing data sets for each class, are plotted. It is clearly seen that FC 
misclassified objects which are in a region where the samples are more likely to occur. 
Similar plots for the other two examples are available in [10]. 




Fig. 1. Testing samples and the corresponding classifiers for two-dimensional normally distributed 
random vectors whose parameters are those of Example 2DD-1. 



We also conducted experiments on real-life datasets. For the training and classifica- 
tion tasks we have composed 10 data subsets with all possible pairs of features obtained 
from the first five numeric features. For each of the pairs we composed the training set 
and the testing set by drawing samples without replacement from the original datasets. 
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The OQC and FC have been trained by invoking the traditional maximum likelihood 
method (MLE) [4,14]. The OPLC and the BHC have been trained by following the 
procedure described in [10], thus yielding the approximated pairwise linear classifier, 
and subsequently the best hyperplane, for each subset. 

The classification of each object was performed using the classifiers mentioned 
above, and invoking a voting scheme, which assigns the class in which the sample 
yielded a positive result for the majority of voters. Ten voting rounds were invoked 
(one for each pair of features), and thus, the majority for class ci was chosen to be five 
or more voters. From the WDBC dataset, we randomly selected, without replacement, 
100 samples for training, and 100 samples for the testing phase for each class. The 
classification accuracy obtained from testing the OQC, the OPLC, the BHC and FC are 
shown in Table 2. The results on the table show that using the voting scheme, as expected, 
the OQC is more accurate than the other classifiers. We also observe that the OPLC and 
the BHC (both achieving the same classification accuracy) lead to higher classification 
accuracy than FC. When considering the pair-based classification, the averages on the 
fifth column show that the OQC was the most accurate classifier. In this scheme, the 
BHC outperformed the OPLC, and FC was the least accurate classifier. We also observe 
that on the WDBC, the OPLC and the BHC achieve nearly optimal classification. Similar 
results that show the efficiency of the BHC, and a graphical analysis on real-life date are 
available in [10]. 



Table 2. Classification accuracy obtained from testing the classifiers on the WDBC data set. 



Classifier Benign Malignant Avg.(voting) Avg.(pair) 



OQC 


96.00 


87.00 


91.50 


88.45 


OPLC 


95.00 


86.00 


90.50 


87.85 


BHC 


95.00 


86.00 


90.50 


88.05 


FC 


93.00 


85.00 


89.00 


86.30 



5 Conclusions 

In this paper, we presented an approach that selects the best hyperplane classifier (BHC) 
from the optimal pairwise linear classifier (OPLC). We first introduced the criteria for 
selecting the BHC given the OPLC. We then formalized the conditions for selecting the 
BHC for three cases. In the second case (the most general scenario for multi-dimensional 
random vectors), we have shown that the BHC is identical to Fisher’s classifier (FC). 

The efficiency of the BHC, the OPLC and FC has been evaluated in terms of classi- 
fication accuracy and speed. In terms of accuracy, we have shown that the BHC is nearly 
optimal, and in some cases, it achieves the same accuracy as FC. The empirical results 
on real-life datasets show that the OPLC and the BHC attained similar classification ac- 
curacy, and that the BHC is superior to FC in the WDBC datasets. The graphical analysis 
corroborates this relation. 
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The extension of the BHC for c?-dimensional random vectors, where c? > 2, is far 
from trivial, as it involves to derive an MLE method for the constrained pairwise linear 
classifier. How this MLE is designed, and how the corresponding BHC is derived is a 
problem that is currently being undertaken. 
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Abstract. For a given planar region P its discretization on a discrete 
planar point set S consists of the points from S which fall into P. If 
P is bounded with a convex polygon having n vertices and the number 
of points from P DS is finite, the obtained discretization of P will be 
called discrete convex n-gon. 

In this paper we show that discrete moments having the order up to 
n characterize uniquely the corresponding discrete convex n-gon if the 
discretizing set <S is fixed. In this way, as an example, the matching of 
discrete convex n-gons can be done by comparing | • (n -|- 1) ■ (n -|- 2) 
discrete moments what can be much efficient than the comparison 
“point-by-point” since a digital convex n-gon can consist of an arbitrary 
large number of points. 

Keywords: Discrete shape, coding, moments, pattern matching. 



1 Introduction 

It is known for many years that the moments are good descriptors of real shapes 
([4,8]). They are used in many computer vision, image processing, and pattern 
recognition tasks ([2,5]). For simple “continuous” shapes as they are lines, circles, 
ellipses,... a finite set of moments is sufficient for recovering the original shape 
- usually, it is enough to solve a system of equations. If we try to reconstruct 
regions bounded by convex polygons (triangles, quadrangles,...) the problem of 
reconstruction of the original shape from a set of moments becomes more com- 
plicated but still solvable. The complication comes from the fact that there are 
no suitable equations for the boundaries of such polygonal convex regions. 

But, in computer applications of the “moment techniques” we manipulate 
mostly with discrete data - not with real objects described by their equations. 
In areas as they are pattern recognition, pattern classification, (digital) image 
analysis, e.t.c., real objects are replaced with their discretizations - i.e., they are 
represented by finite point sets which are obtained by some discretizing process. 
It implies that there are infinitely many real shapes with the same discretization. 

* The author is also with the Mathematical institute of Serbian Academy of Sciences 
and Arts, Belgrade. 
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a) random grid b) hexagonal grid c) squared (integer) grid 

Fig. 1. Discretizations of isometric shapes A (points labeled by a) and B (points la- 
beled by b) on three different discretizing sets are shown. In all presented cases, the 
discretizations of A and B are non isometric, but discretizations of A coincide with the 
discretizations of a given ellipse E. 



So, even if we know an equation of the real object (what usually does not happen 
in the mentioned research areas), it is not suitable to represent the discrete 
image of a given real object by the corresponded equation because it can be 
happened that the considered digital object has many different characterizations 
(see discretizations of the shape A from Fig. 1). More over, isometric planar 
regions may have non-isometric discretizations on the same discretizing set (see 
discretizations of the shapes A and B from Fig. 1) which also shows that the 
use of the “original objects” (sometimes called preimages) for a characterization 
of a given discrete object could be inappropriate. 

Here, we prove that discrete moments having the order up to n are enough 
for a unique characterization of discrete convex n-gons presented on a fixed 
discrete set. In this way, as an example, a fast comparison between discrete 
convex n-gons is enabled. In case of a relatively small number of edges of digitized 
convex polygons with respect to the number of sample points the comparison of 
the corresponded discrete moments can be much faster than than the comparison 
“point-by-point” . 

We conclude this introduction with the basic definitions and denotations. 

By discrete convex n-gon we mean the discretization of a planar region which 
is bounded by a convex n-gon. Formally, a discrete convex n-gon D(P), (see 
Fig. 2) from a fixed discrete point set S is defined as 

D(P) = {{x, y) I (a;, y) £ P n 5, the boundary of region P is a convex n-gon}. 

Throughout the paper, it will be assumed but not mentioned, any appearing 
discrete convex n-gons consists of a finite number of points. For an illustration. 
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Fig. 2. (a) Discretizations of non isometric quadrangles ABCD and PQRS are iden- 
tical and they consist of 12 numbered points, (b) During the discretization process 
the vertices A,B,C, and D (i.e., P,Q,R, and S) are usually unknown. We can only 
manipulate with the obtained discrete set (i.e., discrete 4-gon). 



the discretizations on the set consisting of all points with the coordinates which 
are rational numbers (i.e., S = ) are not considered. 

Since the characterization of discrete convex n-gons described here is based 
on a suitable use of discrete moments we give a precise definition. The discrete 
moment fip^q(X) of a finite number point set X is: 

xP ■y‘^ . 

{x,y)GX 

The moment fip^q{X) has the order p+ q. In the rest of the paper it will be 
assumed (even not mentioned) that p and q are nonnegative integers. The set of 
nonnegative integers is denoted by Nq. 

Through the paper a finite set means a set consisting of a finite number of 
points. Also, a unique characterization and a coding will have the same meaning. 

We shall say that a continuous function z = /(x, y) separates sets A and 
B if the sign of f{x,y) in the points from A differs from the sign of f{x,y) 
in the points from B. Precisely, it is 

either Ac{{x,y) \ /(x, y) > 0} and B C {{x,y) \ /(x, y) < 0}, 

or A C {(x,y) I /(x,y) < 0} and B C {(x, y) | /(x, y) > 0}. 

Some examples are given on Fig. 3. 

2 Characterization of Discrete Convex n-gons 

In this section it will be shown that the discrete moments having order up to n 
match uniquely the discretized polygonal convex n-gon presented on a fixed set 
S. We start with the following theorem. 
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Fig. 3. (a) z = c ■ X + d ■ y + e separates discrete sets A (points labeled by a) and 
B (points labeled by b). z = {x — f)^ + iv — g)^ — h also separates A and B. But, 
z = {c-x + d- y + e)- {{x — f)^ + {y — qY ~ does not separate A and B. 

(b) 2 = f(x, y) does not separate the sets A (points labeled by a) and B (points labeled 
by b). z = g(x,y) also does not separate A and B. But, z = f{x,y) ■ g(x,y) separates 
A and B. 



Theorem 1. Let Ai and A2 be two finite planar sets. If there exists a 
function of the form 

f{x,y)= X! ^p,<i ' ' y‘^ y where p, g € No (1) 

p+q<k 

which separates A\ \ A2 and A2 \ Ai then 

tkp,q{Ai) = p,p^q{A2) with p, q € No and p + q < k 
is equivalent to Ai = A2 . 

Proof. If Ai = A2 then the corresponding discrete moments are equal obviously. 
It remains to prove that the equality of the corresponded moments of the order 
up to k preserves Ai = A2. We prove that Ai ^ A2 and 

^ = PpJA,) = MpM = E 

(x,y)eAi (x,y)eA 2 

(satisfied for all p,q G No with p + q < k) lead to a contradiction. Since 
Ai 7^ A2 we can assume Ai\ A2 ^ % (else we can start with ^2 \ ^1 ^ 0)- 
Further, because there exists a function f{x,y) = X]p+(3<fc ^p,q ' ■ y‘^ which 

separates Ai\A2 and A2\Ai, we can assume (for instance) f{x,y )>0 if 
(x, y) G Ai\ A2, while (x, y) G ^2 \ Ai implies /(x, y) < 0. Now, we are able 
to derive the contradiction 0 < 0 which finishes the proof. 

o< fi^^y) 

(x,y)eAi\A2 (x,y)eA2\Ai 
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E i E 



ttp,, -xP ■y'^ + ^ -xP ■y'^ 




p+q<k y(x,y)eA: 



{x,y)£Ai\A2 



(x,y)£AinA2 



E i E 



ap,q ■x'P -y^ + ^ Qfp,g • • y'J 



p+ij<fc 



(x,y)£A2\Ai 



[x,y)&A-i_nA2 



E “p.9 ■ E 



<^p,q ■ E • y”? = 0. I 



p+g</c 



(a:,y)eAi 



(x,y)&A2 



Next, we prove the main result of the paper. 

Theorem 2. Fix a discrete point set S. Let given discrete convex n-gons 
D(P), D(Pi) C 5. Then 



Proof. Let D(P) and D(Pi) be different discrete convex n-gons. We will show 
that there always exists a separating function of the form (1) (with k <n) which 
separates D(P) \ D(Pi) and D(Pi) \ D(P). Then, the statement will follow 
directly form Theorem 1, specifying Ai — D(P) and A 2 = D(Pi). 



Since D(P) ^ D(Pi) and ^o,o(D(P)) = ^o,o(D(Pi)) are assumed, we 
have D(P) \ D(Pi) ^ 0 and D(Pij \ D(P) 0. 



For convenience and without loss of generality we can assume that P and 
Pi do not have common vertices and common edges, but also that there is no 
edge of P (i.e.. Pi) which belongs to the boundary of Pi (i.e., P) - such an 
assumption is possible because D(P) and D(Pi) are finite number point sets. 
Let we consider the set-intersection of P and Pi and let 



be the vertices of P fl Pi listed in the counterclockwise order and denoted in 
such a way that the line segments 






[Ai,A2], . . . , [Ai^-x,Aif\ 






belong to the boundary of P, while 
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belong to the boundary of P\. Further, let 

- ai • x + /3i • y — 7i = 0 be the line l\ determined by A\ and 

- tt2 • cc + /?2 • y — 72 = 0 be the line I 2 determined by Ai^+i and Ai^] 



- ak ■ X + Pk ■ y — "/k = 0 be the line Ik determined by and 

Then we will show that the function 

k 

f(x,y) = + Pi- y - li) (2) 

i=l 

separates the set-differences D(P) \ D(Pi) and D(Pi) \ D(P). 

Namely, since the points 

are successive intersection points of the boundaries of P and Pi, we have: 

i) for any i from {l,2,...,fc}, all points from D(P)\D(Pi) belong to the 
same half-plane determined by the line p — consequently, f(x,y) in all 
points from D(Pi)\D(P) takes the same sign; 
ii) for any point X from D(Pi) \ D(P) there is exactly one integer i from 
n} such that k separates X from D(P) \ D(Pi). In other words, 
the function f{x, y) takes the same sign in all points from D(Pi) \ D(P) 
and the sign differs from the sign taken in the points from D(P) \ D(Pi). 

The items i) and ii) imply that f(x,v) is a separating function for 
D(P)\D(Pi) and D(Pi)\D(P). 

Let us mention here that another separating function for the same set- 
differences is 

k 

f{x,y) = + Pi ■ y - %) (3) 

i=l 

where 

- di • cc -I- /?i • y — 7i = 0 is the line p determined by Bi and 

- d2 • a: -I- /?2 • y — 72 = 0 is the line I 2 determined by Pji+i and Bj^-, 



- ctk ■ X + Pk ■ y — pk = 0 is the line Ik determined by and Bj^. 

Due to Theorem 1, the existence of a function of the form (1) which separates 
D(P) \ D(Pi) and D(Pi) \ D(P) (we can take either (2) or (3)) completes the 
proof. I 

The previous proof is illustrated by Fig. 4. In the given example n = 7, fc = 3, 
and two separating functions are described in the capture of the figure. 
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3 Comments and Conclusion 

In this paper we consider finite number point subsets of a fixed discrete point 
set S which are occupied by planar regions whose boundaries are convex n-gons. 
Such sets are called discrete convex n-gons. Through the manuscript, there is no 
any assumption about the structure of S. We derive a result which shows that 
there are no two different discrete convex n-gons whose corresponded discrete 
moments of the order up to n coincide. Or, more formally, the mapping 

D(P) ^ ( mo.o(D(P)), ^io.^mP)). m.o(D(P)), 

/xo,„(D(P)), /ri,„_i(D(P)), ..., fin,o(X>{P)) ) 

is one-to-one while D(P) belongs to the set of digital convex n-gons. 

A precise performance analysis could not be given since there are no assump- 
tion about the structure of 5 . In any case, it can be said that the result enables 
the matching of discrete convex n-gons by comparing numbers (which 

are all discrete moments of the order up to n) instead of the comparing “point- 
by-point” . Obviously, the comparison “point-by-point” can be very expensive 




Fig. 4. f[x,y) = HL ■ X -t- /3i^ ■ y — yij is a, separating function for D(P) \ D(Pi) 
(points labeled by a), and D(Pi) \ D(P) (points labeled by 6). 

In accordance with denotations from the proof of Theorem 2, Bi = A^, B 2 = A@, 
B3 = As, B5 = Ag, Be = Aio, and Br = Ai. The function which is the product of the 
linear functions corresponded to lines determined by pairs of points (Pi, P2), (P3, P5), 
and {Be, B7) is also a function which separates D(P) \ D(Pi) and D(Pi) \ D(P). 
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because the discrete convex n-gons may consist of an arbitrary large number of 
points. In areas of practical applications S can correspond to the set of sensors or 
to the set of reference points whose choice depends of the nature of application. 

Of course, more precise analysis can be done in the area of digital image 
analysis - i.e., if S is an integer (squared) grid (it is planned as a future work). 
In the literature, the problems related to digital squares are already considered. 
In [3,9] the recognition problem is studied. 

It is worth to mention that the representation (coding) problem for digital 
convex polygons from an integer grid can be solved by decomposing the bound- 
ary of the considered digital polygon onto digital straight line segments and, after 
that, to characterize (code) the obtained straight line segments. Both, efficient 
algorithms for the decomposition of digital curves into maximal straight line seg- 
ments and efficient coding scheme for digital straight line segments already exist 
- see [6,10] and [1,7]. Let us notice, that the coding scheme presented here is 
expected to be more robust because it is not based on the boundary points only, 
as it would be in the case of the coding based on the boundary decomposition 
into digital straight line segments. Also, such a coding procedure could not be 
applied to discrete convex polygons from an arbitrary discretizing set. 
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Ab.stract. This paper describes the application of a new model to learn 
probabilistic context-free grammars (PCFGs) from a tree bank corpus. The 
model estimates the probabilities according to a generalized fc-gram scheme 
for trees. It allows for faster parsing, decreases considerably the perplexity 
of the test samples and tends to give more structured and refined parses. 
In addition, il also allows several smoothing techniques such as backing- 
off or interpolation that are used to avoid assigning zero probability to any 
sentence. 



1 Introduction 

Context-free grammars may be considered to be the customary way of rep- 
resenting syntactical structure in natural language sentences. In many natural- 
language processing applications, obtaining the correct .syntactical structure for 
a sentence is an important intermediate step before assigning an interpretation 
to it. But ambiguous parses are very common in real natural-language sentences 
(e g., those longer than 15 words). A set of rather radical hypotheses as to how 
humans select the best parse tree [1] propose that a great deal of syntactic dis- 
ambiguation may actually occur without the use of any semantic information; 
that is. Just by selecting a preferred parse tree. It may be argued that the pref- 
erence of a parse tree with respect to another is largely due to the relative fre- 
quencies with which those choices have lead to a successful interpretation. This 
sets the ground for a family of techniques which use a probabilistic scoring of 
parses to the correct parse in each case. 

Probabilistic scorings depend on parameters which are usually estimated 
from data, that is, from paised text corpora such as the Penn Treebank [2]. The 
most straightforward approach is that of treebank grammars, [3] . Treebank gram- 
mars are probabilistic context-free grammars in which the probability that a 
particular nonterminal is expanded according to a given rule is estimated as the 
relative frequency of that expansion by simply counting the number of times it 
appears in a manually-parsed corpus. This is the simplest probabilistic scoring 

* The authors wish to thank the Generalltal Valenclana project CTIDIB/2002/173 and 
the Spanish CICyT project T1C2000- 1599 for supporting this work. 
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scheme, and It is not without problems; we will show how a set of approxi- 
mate models, which we will call o/Tspring-annotated models, in which expansion 
probabilities are dependent on the feature expansions of children, may be seen 
as a generalization of the classic Ar-gram models to the case of trees, and in- 
clude treebank grammars as a special case; other models, such as Johnson's 
[4] parent-annotated models (or more generally, ancestry annotated models) and 
IBM history-based grammars [5, p. 423), [6] offer an alternate approach in which 
the probability of expansion of a given nonterminal is made dependent on the 
previous expansions. An interesting property of many of these models is that, 
even though they may be seen as context-dependent, they may still be easily 
rewritten as context-free models in terms of specialized versions of the original 
nonterminals. 

The next section proposes our generalization of the classic A: -gram models 
to the case of trees, which is shown to be equivalent to having a specialized 
context-free grammar. A simplication of this model, called the chdd-annolated 
model or A; = 3, for short, is also presented in that section. 

2 The Model 

Let f? = {ti , T 2 , . . . , T| j 7 | } be a treebank, that is. a sample of parse trees. 

For all A > 0 and for all trees r = c7(t| £2 . . . £m) 6 ^ we define the k-root of r 
as the tree 




a if A = 1 

cf{rk-iih ) . . otherwise 



( 1 ) 



The sets /*(£) of k- forks and Sk{t) of k-subtrees are defined for all A > 0 as 
follows; 



=U™1 /*(£,) u 



0 if 1 -f depth(o-(£i . . . t^)) < A 

fkicriti . . ■ £m)) otherwise 

( 2 ) 



s/fc(cr(£i ...£,„)) =U^iS* 




<t(£i . . . tm) if 0 < deplh(cr(£i . . . 1^)) 
0 otherwise 



< A 



(3) 



where deplh{l) denotes the depth of the tree £ having in own that in a single 
node tree it is zero. 

We define the treebank probabilistic A testable grammar G — {A^,S,S,V) 
through; 



- W = rt_,(/t(T?)) U «*_,(/?) U {5}: 

- 27 is the set of labels in 12; 

- 5 is the start symbol. 
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- P = {(r,p{r)) I r e /i A p(r) € [0,1]} where R C M x (vV U S)'*' is 
a set of production rules (usually written as /4 -> a, where A and 
a £ U i7)+) and p(r) is the emission probability associated with the rule 
r. The set V is built as follows: 

• for every tree t e rj;(f2) add to V the rule S — > t with probability 



p{S — > f) = 



\n\ 



( 4 ) 



where <Jo t = 1 if « = ^ and zero otherwise; 

• for every tree cr(tit2 • • • fm) € fk {O) add to V the rule Vk-i (<r{tit2 • • • tm)) 
ti t2 ... tm with probability 



p{rk-l {(7(ht2 . . . tm)) — t h t2 



tm. ) 






tm . ) 1 t) 



( 5 ) 

Here C(f,r) counts the number of times that the fork t appers in the 
tree r; 

• for every tree a{tit 2 ■ . . im) € add to V the rule • • ■ tm) — > 

U h ■ ■ ■ tm with probability 



p{a{tit 2 ...tm) > kt 2 ... Im) = 1 ( 6 ) 

Defined in this way, these probabilities satisfy the normalization constraint 
for each A £ M \ ^ p{A — t n) = 1 (7) 

or.A — 

and the consistency constraint. PCFGs estimated from treebanks using the rel- 
ative frequency estimator always satisfy those constraints [7] [81 . 

Note that in this kind of models, the expansion probability for a given node 
is computed as a function of the subtree of depth k-2 that the node generates, 
i.e., every non terminal symbol stores a subtree of depth fc - 2. In the particular 
case k~2, only the label of the node is taken into account (this is analogous to 
the standard bigram model for strings) and the model coincides with the simple 
rule-counting approach used in treebank grammars by Charniak |9|. 

However, in the case k — 3, we get a child-annotated model, that is. non- 
terminal symbols a{a\ (7g . . . i7„,) are defined by: 

- the node label a, 

- the number m of descendenls (if any) and 

- the labels in the descendents (Ti , ag, . . . , Um (if any) and their ordering. 

As an ilustration, consider a very simple sample with only the tree in the 
figure 1. If we choose k — 2, then 

- n(S(NPVP)) = S; 
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- /2(S(NPVP)) = {S(NPVP),NP(N),VP(VNP),NP(NPPP),PP(PNP)} 

- si(S(NP VP)) = 0 

and the CFG is 



= ({S,NP,VP,PP},{N,V,P},S,?’), 
with V containing the rules 



S ^ NP VP 
NP NP PP 
NP^ N 
VP ^ VP PP 
VP V NP 
PP -+ P NP 

However, for fc = 3 we obtain 

- r 2 (S(NPVP)) = S(NPVP) 

- /3(S(NP VP)) = {S(NP(N) VP(V NP)), VP(V NP(NP PP)), 

NP(NP(N) PP(P NP)), PP(P NP(N))} 

- S2(S(NPVP)) = {NP(N)} 

and the CFG is 

GI®] = ({S, S(NP VP), NP(N), VP(V NP), NP(NP PP), PP(P NP)}, {N, V, P}, S, P), 



with V containing the rules 

S — >S(NPVP) 

S(NP VP) — > NP(N) VP(V NP) 

VP(VNP) ^VNP(NPPP) 

NP(NP PP) — > NP(N) PP(P NP) 

PP(PNP) — ^PKP(N) 

NP(N) ^ N 

For comparison, if one uses a parent-annotated version of the grammar {fol- 
lowing Johnson |4I), one gets the following rules ' (where the superindex is the 
parent’s label). 



S ^ ^NP ^VP 
^NP ^ N 
^VP —4 V ''PNP 
vPf>jp ^ ^^NP '^Ppp 

'"■''NP — t N 
'"’’PP — > P ppnP 
ppNP ^ N 



' As will be seen in section 3, parent-annotated grammars usually have less parameters 
than child-annotated grammars, contrary to what this example may suggest. 
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Fig. 1. A sample parse tree 



3 Experiments 

3.1 General Conditions 

We have performed a series of experiments to assess the structural disambigua- 
tion performance of offspring-annotated models as compared to standard tree- 
bank grammars, that is, to compare their relative ability for selecting the best 
parse tree. To better put these comparisons in context, we have also evaluated 
Jolinson’s [4] parent annotation scheme. To build training corpora and test sets 
of parse trees, we have used English parse trees from the Penn Treebank, release 
3. In all experiments the training corpus, consisted of all of the trees (41,532) in 
sections 02 to 22 of the Wall Street Journal portion of Perm Treebank, modified 
as above. This gives a total number of more than 600,000 subtrees. The test set 
contained all sentences in section 23 having no more than 40 words. 

A Chappelier and Rajman's [10] probabilistic extended Cocke-Younger-Kasami 
parsing algorithm (which constructs a table containing generalized items like 
those in Earley’s [11] algorithm) was used to obtain the most likely parse for 
each sentence in the training .set; this parse was compared to the corresponding 
gold-standard tree in the test set using the customary PARSEVAL evaluation 
metric [12,5, p. 432] after deannotating the most likely tree delivered by the 
parser. PARSEVAL gives partial credit to incorrect parses by establishing the 
labeled precision (P) and labeled recall (R) measures. 

3.2 Structural Disambiguation Results 

Here is a list of the models which were evaluated; 

- A standard treebank grammar, with no annotation of node labels (NO or 
k = 2). with probabilities for 15,140 rules. 
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- A child-annotated grammar (CHILD or A: = 3), with probabilities for 92,830 
rules. 

- A parent-annotated grammar (PARENT), with probabilities for 23,020 rules. 

- A both parent- and child-annotated grammar (BOTH), with probabilities for 
1 12,610 rules. 

As expected, the number of rules obtained increases as more information 
is conveyed by the node label, although this increase is not extreme. On the 
other hand, as the generalization power decreases, some sentences in the test 
set become unparsable, that is, they cannot be generated by the grammar. 



Annotation 


R 


P 


/fl=100% 


EXACT 


PARSED 


t 


No (k = 2) 


70.7% 


76.1% 


10.4% 


10.0% 


100% 


57 


Child (k = 3) 


79.2% 


74.2% 


19.4% 


13.6% 


94.6% 


9 


Parent 


80.0% 


81.9% 


18.5% 


16.3% 


100% 


340 


Both 


80.1% 


75.6% 


20.5% 


14.7% 


79.6% 


75 



Table 1. Parsing results with different annotation schemes; labelled recall R, labelled 
precision P. fraction of sentences with total labelled recall /r=ioo%. fraction of exact 
matches, fraction of sentences parsed by the annotated model, and average time per 
sentence in seconds. 

The results in table 1 show that 

- The parsing performance of parent-annotated and child-annotated PCFG is 
similar and better than those obtained with the standard treebank PCFG. 
The performance is measured both with the customary PARSEVAL metrics 
and by counting the number of maximum-likelihood trees that (a) match 
their counterparts in the treebank exactly, and (b) contain all of the con- 
stituents in their counterpart (100% labeled recall, /fi=ioo%)- The fact that 
child-annotated grammars do not perform better than parent-annotated 
ones may be due to their larger number of parameters compared to parent- 
annotated PCFG. This makes it difficult to estimate them accurately from 
currently available treebanks (only about 6 subtrees per rule in the experi- 
ments). 

- The average lime to par.se a sentence shows that child annotation leads to 
parsers that are much faster. This comes as no surprise because the number 
of possible parse trees considered is drastically reduced; this is, however, 
not the case with parent-annotated models. 

It may be worth mentioning that parse trees produced by child-annotated mod- 
els tend to be more structured and refined than parent-annotated and unanno- 
tated parses which tend to use rules that lead to flat trees. 

On the other hand, child-annotated models, CHILD and BOTH, were unable 
to deliver a parse tree for all sentences in the test set (child parses 94.6% of the 
sentences and BOTH, 79.6%). To be able to parse all sentences, those smoothed 
models, were evaluated: 

- A linear interpolated model. Ml, where the probability of a tree t is 

p{t) = Xpi{t) + (1 - X)p2{t) 



( 8 ) 
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here, pj{t) and P 2 {t) are the probabilities of the tree t in, respectively, the 
model fc = 3 and k = 2. The value of A was 0.7 (selected to minimize the 
perplexity). 

- A tree-level back off, M2, where the highest order model such that the prob- 
ability of the event is greater than zero is selected. Some care has to be taken 
in order to preserve normalization. 

- A rule-level back-off model, M3 that builds a new PCFG from the rules 
of the tree-fe-grammar models and adding new rules which allow to switch 
among those models. In particular, the new PCFG consists of three different 
kinds of rules: 

1. fc — 3 rules with modified probability in order to preserve normaliza- 
tion, 

2. back-off rules that allow to switch to the lower model, cind, 

3. modified fc = 2 rules to switch-back to the higher model. 

The new grammar has 92,830 fc = 3 rules, 15,140 fc = 2 rules and 10,250 
back off rules. 



MODEL 


R 


P 


EXACT 


PARSED 


t 


Ml 


80.2% 


78.6% 


17.4% 


100% 


57 


M2 


78.9% 


74.2% 


17.1% 


100% 


9.3 


M3 


82.4% 


81.3% 


17.5% 


100% 


68 



Table 2. Parsing results with different smoothed models. 

The rasults in table 2 show that: 

- M2 is the fastest but its performance is worse than that of Ml and M3. 
-Ml and M3 parse sentences at a comparable speed but recall and precision 

are better using M3. 

Compared to un-smoothed models, smoothed ones: 

- Cover the whole test set (fc = 3 did not). 

- Parsed at reasonable speed (compared to PARENT). 

- Achieved acceptable performance (fc = 2 did not). 



4 Conclusion 

We have introduced a new probabilistic context-free grammar model, offspring- 
annotated PCFG in which the grammar variables are specialized by annotating 
them with the subtree they generate up to a certain level. In particular, we have 
studied child-annotated models (one level) and have compared their parsing 
performance to that of unannotated PCFG and of parent-annotated PCFG [4]. 
Offspring-annotated models may be seen as a special case of a very general 
probabilistic .state-based model, which in turn is based on probabilistic bottom- 
up tree automata. The experiments show that: 
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- The parsing performance of parent-annotated and tire proposed child-annotated 

PCKG is similar. 

- Parsers using child-annotated grammars are, however, much faster because 
the number of possible parse trees considered is drastically reduced: this is, 
however, not the case with parent-annotated models. 

- Child-annotated grammars have a larger number of parameters than parent- 
annotated PCFG which may make it difficult to estimate them accurately 
from currently available treebanks. 

- Child-annotated models tend to give very structured and refined parses 
instead of flat parses, a tendency not so strong for parent-annotated gram- 
mars. 
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Abstract. The main goal of this research is to study the usefulness 
of the Simulated Annealing (SA) approach, developed in the context of 
the Fuzzy Inductive Reasoning (FIR) methodology, for the automatic 
identification of fuzzy partitions in the human Central Nervous System 
(CNS) modeling problem. The SA algorithm can be viewed as a pre- 
process of the FIR methodology that allows the modeler to use it in a 
more efficient way. Two different SA algorithm cost functions have been 
studied and evaluated in this paper. The new approach is applied to 
obtain accurate models for the five controllers that compose the CNS. 
The results are compared and discussed with those obtained by other 
inductive methodologies for the same problem. 



1 Introduction 

The human central nervous system controls the hemodynamical system, by gen- 
erating the regulating signals for the blood vessels and the heart. These signals 
are transmitted through bundles of sympathetic and parasympathetic nerves, 
producing stimuli in the corresponding organs and other body parts. 

In this work, CNS controller models are identified for a specific patient by 
means of the Fuzzy Inductive Reasoning (FIR) methodology. FIR is a data 
driven methodology that uses fuzzy and pattern recognition techniques to infer 
system models and to predict its future behavior. It has the ability to describe 
systems that cannot easily be described by classical mathematics (e.g. linear 
regression, differential equations) i.e. systems for which the underlying physical 
laws are not well understood. The FIR methodology is composed of four main 
processes, namely: fuzzification, qualitative model identification, fuzzy forecast 
and defuzzification. 

The first step of the FIR methodology is the fuzzification process, that con- 
verts quantitative data stemming from the system into fuzzy data. In this process 
the number of classes of each variable (i.e. the partition) needs to be provided. 
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In this paper an algorithm based on a simulated annealing technique/method, 
developed in the context of FIR, is used to automatically suggest a good parti- 
tion of the system variables in an efficient way. The SA algorithm can be viewed 
as a pre-process of the FIR methodology that allows the modeler not to rely on 
heuristics for the definition of a system variable partition. Two SA algorithm cost 
functions are proposed in this research that make use of the qualitative model 
identification and the forecast processes of FIR methodology. A brief description 
of these processes are given next. The qualitative model identification process of 
the FIR methodology is the responsible to find causal and temporal relations 
between variables and therefore to obtain the best model that represents the 
system. A simplified diagram of the qualitative model identification process is 
presented in figure 1. 



FIR model 




FIR aualitative model identification vrocess 

Fig. 1. Simplified diagram of the FIR qualitative model identification process 



A FIR model is composed of a mask (model structure) and a pattern rule 
base. An example of a mask is presented in figure 1. Each negative element in 
the mask is called a m-input (mask input). It denotes a causal relation with the 
output, i.e. it influences the output up to a certain degree. The enumeration 
of the m-inputs is immaterial and has no relevance. The single positive value 
denotes the output. In position notation the mask of figure 1 can be written 
as (2,5,8,11,12), enumerating the mask cells from top to bottom and from 
left to right. The qualitative identification process evaluates all the possible 
masks and concludes which one has the highest prediction power by means of an 
entropy reduction measure, called the quality of the mask Q. The mask with the 
maximum Q value is the optimal mask. Starting from the fuzzified system data 
and using the optimal mask, the pattern rule base is then synthesized. Both, 
the pattern rule base and the mask constitute the FIR model. Once the pattern 
rule base and the optimal mask are available, system predictions can take place 
using FIR inference engine. This process is called fuzzy forecast. FIR inference 
engine is a specialization of the k-nearest neighbor rule, commonly used in the 
pattern recognition field. Defuzzification is the inverse process of fuzzification. It 
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allows to convert the qualitative predicted output into quantitative values that 
can then be used as inputs to an external quantitative model. For a deeper inside 
of the FIR methodology refer to [1]. 

2 Simulated Annealing for Identification of Fuzzy 
Partitions in FIR 

Simulated annealing is a generalization of a Monte Carlo method and it is used 
to approximate the solution of large combinatorial optimization problems [4]. 
A simulated annealing algorithm consists of two loops. The outer-most loop 
sets the temperature and the inner-most loop runs a Metropolis Monte Carlo 
simulation at that temperature. The algorithm starts with an initial solution to 
the problem, which is also the best solution so far and a value for an initial high 
temperature. Each iteration consists of the random selection of a new solution 
(candidate solution) from the neighborhood of the current one. The cost function 
of the candidate solution is evaluated and the difference with respect to the cost 
function value of the current solution is computed. If this difference is negative 
the candidate solution is accepted. If the difference is positive the candidate 
solution is accepted with a probability based on the Boltzmann distribution. 
The accepted candidate solution becomes the current solution and if its cost 
function value is lower than the one of the best solution, this one is updated. If 
the candidate solution is rejected the current solution stays the same and it is 
used in the next iteration. The temperature is lowered in each iteration down to 
a freezing temperature where no further changes occur. A detailed description 
of the simulated annealing algorithm developed for the automatic identification 
of fuzzy partitions in the FIR methodology can be found in [2] . 

Two main aspects of the simulated annealing algorithm that need to be 
considered here are the new solution generation mechanism and the cost function. 
Both are highly important to achieve a good performance of the algorithm. 

The new solution generation mechanism consists of two tasks. The first one is 
the generation of the initial partition at the beginning of the algorithm execution. 
The second one is the generation of a new solution (i.e. candidate solution) 
starting from the current solution, in each algorithm iteration. Two options have 
been studied in this paper to generate an initial partition: 3-classes partition and 
random partition. The first one sets all the variables to 3 classes. The second one 
performs a random generation of the number of classes for each system variable. 
In this research the number of classes allowed for each system variable is in the 
range [2 ... 9] . 

The procedure to generate a new solution, i.e., the candidate solution, from 
the current one is to increment or decrement by one the number of classes as- 
sociated to a certain system variable. The variable that is going to be modified 
is chosen randomly out of the vector of variables. The decision to increase or 
decrease the number of classes of this variable is also randomly taken. 

Two different cost functions have been studied in this work: the quality of 
the optimal mask and the prediction error of the training data set. 
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As has been explained earlier, in the qualitative model identification process 
of the FIR methodology the optimal mask (i.e. the best model structure) is 
identified by means of a quality measure, Q. The quality of a mask is a value 
between 0 and 1, where 1 indicates the highest quality. Therefore, the first cost 
function proposed is 1 — Q, due to the fact that the algorithm task should 
minimize the cost function. 

The second cost function is defined as the prediction error of a portion of 
the training data set. The normalized mean square error in percentage (MSE), 
given in equation 1, is used for this purpose. 

MSE = . 100% (1) 

Vvar 

y{t) is the predicted output, y{t) the system output and yvar denotes the variance 
of y{t). The idea is to use part of the training data set to identify the model and 
the rest of the data set to evaluate the prediction performance of that model. 
The prediction error of the portion of the training data set not used in the model 
identification process is used as the cost function for the SA algorithm. The size 
of the portion of the training data set actually used for cost function evaluation 
purposes is defined with respect to the size of the whole training data set. 

3 Central Nervous System Modeling 

The central nervous system is composed of five controllers, namely, heart rate 
(HR), peripheric resistance (PR), myocardiac contractility (MC), venous tone 
(VT) and coronary resistance (CR). All the CNS controllers are SISO models 
driven by the same input variable, the carotid sinus pressure (CSP). The input 
and output signals of the CNS controllers were recorded with a sampling rate 
of 0.12 seconds from simulations of the purely differential equation model [3], 
obtaining 7279 data points. The model had been tuned to represent a specific 
patient suffering a coronary arterial obstruction, by making the four different 
physiological variables (right auricular pressure, aortic pressure, coronary blood 
flow, and heart rate) of the simulation model agree with the measurement data 
taken from the real patient. The five models obtained were validated by using 
them to forecast six data sets not employed in the training process. Each one 
of these six test data sets, with a size of about 600 data points each, contains 
signals representing specific morphologies, allowing the validation of the model 
for different system behaviors. 

The main goal of this research is to study the usefulness of the SA approach 
as a pre-processing tool of the FIR methodology for the identification of good 
models for each of the five controllers. Let us explain the experimentation pro- 
cedure for the coronary resistance controller. The same strategy has been used 
for the other four controllers. Their results are presented later. 

As mentioned before, two cost functions were studied in this work. Table 1 
shows the results obtained for the coronary resistance controller when 1 — Q was 
used as cost function. Table 2 presents the results of the same controller when 
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Table 1. Partition results of the CR controller obtained using 1 — Q as cost function 



Ini. Part. 
CSP CR 


Fin. Part. 
CSP CR 


Opt. Mask 


Q 


1-Q 


MSEteat 


#GS 


Time 


(3,3) 


(9,3) 


(1,4,6) 


0.9787 


0.0213 


3.85% 


35 


2.98 


(3,3) 


(7,3) 


(1,4,6) 


0.9776 


0.0224 


4.76% 


26 


1.97 


(3,3) 


(8,3) 


(1,4,6) 


0.9776 


0.0224 


4.25% 


37 


2.56 


(3,3) 


(6,3) 


(1,4,6) 


0.9762 


0.0238 


1.75% 


27 


1.41 


(3,3) 


(5,3) 


(1,4,6) 


0.9749 


0.0251 


2.34% 


24 


1.69 


(3,3) 


(4.3) 


(1,4,6) 


0.9748 


0.0252 


1.33% 


26 


1.38 


(8,8) 


(9,3) 


(1,4,6) 


0.9787 


0.0213 


3.85% 


33 


6.17 


(9,6) 


(9,3) 


(1,4,6) 


0.9787 


0.0213 


3.85% 


26 


3.18 


(7,2) 


(9,3) 


(1,4,6) 


0.9787 


0.0213 


3.85% 


35 


2.38 


(6,5) 


(9,3) 


(1,4,6) 


0.9787 


0.0213 


3.85% 


25 


3.60 


(5,5) 


(9,3) 


(1,4,6) 


0.9787 


0.0213 


3.85% 


35 


3.24 


(5,2) 


(5,3) 


(1,4,6) 


0.9749 


0.0251 


2.34% 


37 


1.44 




Optimal 


Solution: 


Opt.Mask^ 


= (9,3); 


Q= 


0.9787; 



the cost function is defined as the prediction MSE of a portion of the training 
data set. In this application the last 25% of the training signal is used for cost 
function evaluation and only the first 75% of the signal is used to obtain the 
FIR models. 

Both, the 3-classes and the random options have been evaluated as initial 
partitions. The upper rows of tables 1 and 2 show the results of the 3-classes 
initial partition, whereas the lower rows present the results of the random initial 
partition. For both options, 40 executions of the SA algorithm were performed. 
For an initial partition of 3 classes the SA algorithm suggested up to 6 different 
final partitions when 1 — Q is used as cost function (see table 1) and 3 possible 
final partitions when the prediction error is used as cost function (see table 2). 
When the random initial partition is used, only 2 and 4 different final parti- 
tions are suggested by the SA algorithm for the 1 — Q and prediction error cost 
functions, respectively. 

The tables are organized as follows. The first column indicates the initial 
partition from which the SA algorithm starts the search. The second column 
presents the final partition suggested by the SA algorithm when the cooler tem- 
perature is reached (i.e. the algorithm stops). Note that the final partition is 
the input parameter to the fuzzification process of the FIR methodology. The 
third and fourth columns contain the optimal mask obtained by FIR for that 
specific partition (in position notation) and its associated quality, respectively. 
The fifth column corresponds to the cost function evaluation. Note that in table 
1 the cost function is 1 — Q and in table 2 the cost function is the prediction 
MSE of the last 25% data points of the training set. The next column shows 
the prediction error of the test data sets. As mentioned before, six test data sets 
of 600 data points each are available for each controller. The results presented 
in the tables are the mean value of the predictions errors obtained for these 
six test data sets. The seventh column indicates the total number of generated 
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Table 2. Partition results of the CR controller obtained using the prediction error of 
the last 25% of the training data set as cost function 



Ini. Part. 
CSP CR 


Fin. Part. 
CSP CR 


, Opt. Mask 


Q 


MSEtrain 


MSEtest 


#GS 


Time 


(3,3) 


(2.5) 


(1,4, 5,6) 


0.9642 


0.08% 


0.15% 


19 


19.40 


(3,3) 


(3.4) 


(4.5,6) 


0.9638 


0.12% 


0.28% 


32 


38.39 


(3,3) 


(6.4) 


(3,4,6) 


0.9666 


0.17% 


0.42% 


33 


22.37 


(4,4) 


(2.5) 


(1.4, 5,6) 


0.9642 


0.08% 


0.15% 


24 


17.51 


(2,6) 


(2.5) 


(1.4, 5,6) 


0.9642 


0.08% 


0.15% 


19 


10.79 


(3,5) 


(3.4) 


(4,5,6) 


0.9638 


0.12% 


0.28% 


23 


12.31 


(6,5) 


(6.4) 


(3,4,6) 


0.9666 


0.17% 


0.42% 


33 


11.55 


(5,4) 


(6.4) 


(3.4,6) 


0.9666 


0.17% 


0.42% 


30 


16.28 


(9,6) 


(7,4) 


(4,5,6) 


0.9677 


0.18% 


0.41% 


28 


14.29 




Optimal 


Solution: i 


Opt.Mask= 


(2.5); 


MSEtrain = 


0.08%; 





solutions during the execution of the SA algorithm. The last column contain the 
CPU time (in seconds) used by the algorithm to find the final partition. Clearly, 
the biomedical application presented in this paper is not a large optimization 
problem, it is rather small due to the fact that only two variables are involved 
and a maximum of nine classes is allowed (in fact there are only eight, because 
class 1 is not used). Therefore, there exists 64 possible solutions and an exhaus- 
tive search can be performed easily. However, it is interesting to work with a 
real application that shows clearly the usefulness of the SA algorithm for the 
automated definition of fuzzy sets in the FIR methodology. Moreover, the FIR 
performance is considerably increased when the SA algorithm is used in the CNS 
application. 

If we look closer to table 1 it is clear that the optimal solution that corre- 
sponds to the (9,3) partition with a quality of 0.9787 is reached in both initial 
partition options. All the final partitions obtained when a (3,3) initial partition 
is used have in common that a partition of 3 classes is always suggested for the 
output variable, whereas 4, 5, 6, 7, 8 or 9 classes are good partitions for the input 
variable. Notice that the qualities of all the suggested partitions are very close 
to the optimal one. With a random initial partition, only two final partitions 
are suggested by the SA algorithm, i.e. the optimal one (9,3) and a suboptimal 
one (5, 3). The proportion shown in table 1, i.e. five times partition (9, 3) vs. one 
time partition (5,3) is the relation encountered in the 40 runs of the algorithm. 



Table 3. MSE prediction errors of the CNS controller models using NARMAX, TDNN 
and RNN methodologies (mean value of the 6 test data sets for each controller) 



HR PR MC VT CR 

NARMAX 9.3% 18.5% 22.0% 22.0% 25.5% 
TDNN 15.3% 33.7% 34.0% 34.0% 55.6% 
RNN 18.3% 31.1% 35.1% 34.7% 57.1% 
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Table 2 shows the results of the same controller when the prediction error 
of part of the training data set has been used as a cost function for the SA 
algorithm. The function to be minimized now is the MSEtrain- K is interesting 
to remark, that in this case, the mask is obtained using exclusively the first 75% 
data points of the training signal. Therefore, the data used for the cost function 
evaluation has not been seen for the model before. This is the reason why the best 
predictions obtained for the last 25% values of the training set do not correspond 
necessarily to the partitions with the associated optimal mask of highest quality. 
However, the quality of the optimal masks found for the suggested partitions are 
still high, i.e. 0.96. The optimal solution is the partition (2, 5) with a MSEtrain 
of 0.08%, that is really very low. The SA algorithm is able to find the best final 
partition with both initial partition options, as happened also for the quality 
cost function. The (3,4) and (6,4) partitions with errors of 0.12% and 0.17%, 
respectively, are the best suboptimal solutions. Therefore, the SA algorithm 
obtains in fact the best three final partitions. Notice that although the number 
of generated solutions remains almost the same than table 1, the CPU time has 
considerably increased. This is due to the fact that the cost function evaluation 
is much more expensive computationally. Now, not only the qualitative model 
identification process of the FIR methodology is executed but also the fuzzy 
forecast process is. 



Table 4. Partition results of the HR, PR, MC and VT controllers obtained using 1 — Q 
cost function and prediction error of the last 25% of the training data set cost function 





HR 


PR 


1-Q 


Fin. Part. 


1 -g 


MSEtest 


Fin. Part. 


1 -g 


MSEtest 




(7,2)* 


0.1674 


13.43% 


(8,7)* 


0.1448 


5.99 % 




(8,2) 


0.1861 


12.63% 


(7,7) 


0.1505 


4.59 % 




(7,4) 


0.2739 


2.61 % 


(5,7) 


0.1564 


3.15 % 


MSEtrain 


Fin. Part. 


MSEtrain 


MSEtest 


Fin. Part. 


MSEtrain 


MSEtest 




(3,7)* 


0.89% 


9.15% 


(4,9)* 


0.93% 


2.28% 




(5,9) 


1.01% 


2.54% 


(7,7) 


1.08% 


3.34% 




(6,7) 


1.15% 


13.39% 


(2,6) 


1.64% 


3.77% 




MC 


VT 


1-Q 


Fin. Part. 


1 -g 


MSEtest 


Fin. Part. 


1 -g 


MSEtest 




(8,7)* 


0.1866 


11.88% 


(8,7)* 


0.1858 


13.00% 




(7,7) 


0.1950 


42.45% 


(7,7) 


0.1952 


41.88% 




(5,7) 


0.2019 


52.94% 


(5,7) 


0.2032 


53.01% 


MSEtrain 


Fin. Part. 


MSEtrain 


MSEtest 


Fin. Part. 


MSEtrain 


MSEtest 




(4,9)* 


0.60% 


2.51% 


(2,5)* 


0.6117% 


1.66% 




(2,5) 


0.63% 


2.74% 


(2,8) 


0.6359% 


1.55% 




(3,9) 


1.10% 


3.87% 


(3,7) 


0.7855% 


2.12% 



It is interesting to analyze the MSEtest columns of both tables. As expected, 
the M S Etrain cost functioii is able to obtain partitions with higher performance 
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on the prediction of the test data sets than the ones obtained by the 1 — Q cost 
function. However, the results obtained in both cases are very good if compared 
with the ones obtained when other inductive methodologies are used . Table 3 
contains the predictions achieved when NARMAX, time delay neural networks 
and recurrent neural networks are used for the same problem. The columns of the 
table specify the average prediction error of the 6 test sets for each controller. All 
methodologies used the same training and test data sets previously described. 

The errors obtained for all the controllers using the SA approach hand in 
hand with the FIR methodology are much better than the ones obtained by the 
inductive methodologies presented in table 3. Moreover, the highest MSEtest 
of 4.76% obtained with the 1 — Q cost function is half the value of the lower 
error obtained with these methodologies, i.e. 9.3%. Therefore, in this application, 
both cost functions can be considered good for the task at hand. The 1 — Q cost 
function needs less time to be evaluated but the performance with respect to 
the test set prediction is lower. Contrarily, the MSEtrain cost function is more 
expensive from the CPU time point of view but the performance is higher. The 
user should decide which cost function to use taking into account the size of the 
optimization problem and his/her own needs. 

Table 4 contains the partition results of the other four CNS controllers. The 
random initial partition option has been used in all the executions. The SA al- 
gorithm has been executed 40 times for both cost functions for each controller. 
The final partition, the value of the cost function and the mean MSE of the 6 
test data sets are presented for each controller and cost function. An * means 
that that partition is the best possible one, and therefore it is the optimal so- 
lution. As can be seen in table 4 the optimal solution is reached for both cost 
functions in all partitions. The CPU time and number of generated solutions 
are equivalent to those of the CR controller in tables 1 and 2. It is interesting 
to analyze the MSEtest of the HR, PR, MC and Vt controllers. The errors of 
the test sets obtained when the MSEtrain cost function is used are quite good 
for all controllers, and much better than the ones obtained using the inductive 
methodologies of table 3. However, this is not the case for all controllers when 
the 1 — Q cost function is used. Notice that, although the SA algorithm finds 
both the best solution and good suboptimal solutions, the prediction errors of 
the test data sets obtained are of the same order of magnitude than the ones 
obtained by the NARMAX, time delay and recurrent neural networks, partic- 
ularly for the MC and VT controllers. In this case, the quality measure used 
by the FIR methodology is not doing a good job. It can be interesting to study 
alternative quality measures for the task at hand. 



4 Conclusions 

In this paper the usefulness of a simulated annealing approach for the automated 
definition of fuzzy sets in the identification of human central nervous system FIR 
models has been shown. Two cost functions have been evaluated and compared 
from the perspective of their performance and computational time. The results 



Simulated Annealing for Automated Definition of Fuzzy Sets 553 



obtained in the CNS applications are much better than the ones obtained by 
other inductive methodologies such as NARMAX, time delay neural networks 
and recurrent neural networks. 
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Abstract. The aim of this research is to automatically tuning a good 
fuzzy partition, i.e. determine the number of classes of each system vari- 
able, in the context of the Fuzzy Inductive Reasoning (FIR) methodol- 
ogy. FIR is an inductive methodology for modelling and simulate those 
systems from which no previous structural knowledge is available. The 
hrst step of FIR methodology is the fuzzification process that converts 
quantitative variables into fuzzy qualitative variables. In this process it 
is necessary to define the number of classes into which each variable is 
going to be discretized. In this paper an algorithm based on simulated 
annealing is developed to suggest a good partition in an automatic way. 
The proposed algorithm is applied to an environmental system. 



1 Introduction 

The Fuzzy Inductive Reasoning (FIR) methodology emerged from the General 
Systems Problem Solving (GSPS) developed by Klir [1]. FIR is a data driven 
methodology based on systems behavior rather than structural knowledge. It 
is a very useful tool for modelling and simulate those systems from which no 
previous structural knowledge is available. FIR is composed of four main pro- 
cesses, namely: fuzzification, qualitative model identification, fuzzy forecasting, 
and defuzzification. Figure 1 describes the processes of FIR methodology. 
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Fig. 1. FIR structure 



The fuzzification process converts quantitative data stemming from the sys- 
tem into fuzzy data, i.e. qualitative triples. The first element of the triple is the 
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class value, the second element is the fuzzy membership value, and the third el- 
ement is the side value. The side value indicates whether the quantitative value 
is to the left or to the right of the peak value of the associated membership 
function. 

The model identification process is able to obtain good qualitative relations 
between the variables that compose the system, building a pattern rule base that 
guides the fuzzy forecasting process. 

The fuzzy forecasting process predicts systems behavior. The FIR inference 
engine is a specialization of the fc-nearest neighbor rule, commonly used in the 
pattern recognition field. 

Defuzzification is the inverse process of fuzzification. It makes possible to 
convert the qualitative predicted output into a quantitative variable that can 
then be used as input to an external quantitative model. It has been shown in 
previous works that FIR methodology is a powerful tool for the identification 
and prediction of real systems, specially when poor or non structural knowledge 
is available [2,5] . For a deeper insight into FIR methodology the reader is referred 
to [4]. 

As can be seen in figure 1, for the fuzzification process of FIR methodology 
to start it is necessary to define some external parameters, i.e. the partition 
(number of classes of each system variable) and the landmarks (limits between 
classes). The default value for the number of classes’ parameter for each system 
variable is three and the equal frequency partition (EFP) is used as the default 
method to obtain the landmarks of the classes. These default values have been 
used in different applications obtaining usually good results. However, experience 
has shown that in some them, i.e. biomedical and ecological, the determination 
of the partition parameter needed in the fuzzification step becomes relevant for 
the identification of a good model that captures systems behavior in an accurate 
way. The automatic determination of a good partition as a pre-process of FIR 
methodology is an interesting and useful alternative. To achieve this goal an 
algorithm base on simulated annealing is presented in this paper and used in 
an environmental application, i.e. prediction of ozone concentration in a specific 
area of Mexico city. The algorithm proposed is introduced in section 2. In section 
3 the ozone application is addressed and the results obtained discussed. Finally, 
the conclusions of this research are given. 

2 Simulated Annealing Algorithm 

Simulated annealing is a generalization of a Monte Carlo method that was intro- 
duced by Metropolis et al. in 1953 [3]. This technique is used to approximate the 
solution of very large combinatorial optimization problems and is based on the 
manner in which liquids freeze in the process of annealing [7]. In an annealing 
process a melt, initially at high temperature and disordered, is slowly cooled so 
that the system at any temperature is approximately in thermodynamic equilib- 
rium. Cooling proceeds until the final temperature is reached, that corresponds 
to the most stable (lowest energy) system state. If the initial temperature of the 
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system is too low or cooling is not done sufficiently slowly the system may be 
trapped in a local minimum energy state. 

A simulated annealing algorithm consists of two nested loops. The outer-most 
loop sets the temperature and the inner-most loop runs a Metropolis Monte Carlo 
simulation at that temperature. The algorithm starts with an initial solution to 
the problem, which is also the best solution so far and a value for an initial high 
temperature. Each iteration consists of the random selection of a new solution 
(called candidate solution from now on) from the neighborhood of the current 
one. The cost function of the candidate solution is evaluated and the difference 
with respect to the cost function value of the current solution is computed (i5 
in equation 1). If this difference is negative, i.e. the cost function value of the 
candidate solution is lower than the one of the current solution, the candidate 
solution is accepted. If the difference is positive the candidate solution is accepted 
with a probability based on the Boltzmann distribution (equation I) . 

f Boltzmanni^') ®^P( 6/k.T') (I) 

where T is the temperature value and k is the Boltzmann’s constant. The ac- 
cepted candidate solution becomes the current solution and if its cost function 
value is lower than the one of the best solution, this one is updated. If the candi- 
date solution is rejected, i.e. the Boltzmann probability is less than the random 
number generated, the current solution stays the same and it is used in the next 
iteration. The temperature is lowered in each iteration down to a freezing temper- 
ature where no further changes occur. The set of parameters that determine the 
temperature decrement is called the cooling schedule. This parameters are the 
initial temperature, the function that decrements the temperature between suc- 
cessive stages, the number of transitions needed to reach the quasi-equilibrium 
for each temperature value and the stop criterion. 

The main aspects to be considered in a simulated annealing implementation 
are: 1) solution eonfiguration, 2) new solutions generation mechanism, 3) cost 
function and 4) cooling schedule. These aspects, for the algorithm proposed in 
this paper, are explained in detail next while the algorithm is shown in figure 2. 

Solution Configuration 

The solution should contain the number of classes for each variable. The config- 
uration chosen is a vector with the same number of columns than the number of 
system variables, containing integers in the range [2 ■ ■ ■ maxNC]. maxNC the 
maximum number of classes allowed. 

New Solutions Generation Mechanism 

Two options can be used to generate the initial partition, i.e. current solution. 
The first one sets all the variables to 3 classes (default option in the current 
FIR implementation). The second one corresponds to a random generation of 
the number of classes for each system variable. 

The procedure to generate a new solution, i.e. candidate solution, from the 
current one is to increment or decrement by one the number of classes associated 
to a certain system variable. The variable that is going to be modified is chosen 
randomly from the vector of variables. The decision to increase or decrease the 
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number of classes of this variable is also random. If the extremes are reached, 
i.e. 2 or maxNC, it is enforced to apply the increment and decrement operators, 
respectively. 

Cost Function 

An important aspect in this research is to define an appropriate cost function 
for the evaluation of the partitions. To address this issue it is necessary to look 
closer to the qualitative model identification processe of FIR methodology. 

In the process of modeling, it is desired to discover the causal and tempo- 
ral relations between the inputs and the output of the system, that make the 
resulting state transition matrix as deterministic as possible. The more deter- 
ministic the state transition matrix is, the higher is the likelihood that the future 
system behavior will be predicted correctly. In FIR, the causal and temporal re- 
lations among the fuzzy qualitative variables are represented by a mask matrix. 
Equation 2 gives an example of a mask, 

tY *1 *2 *3 O 

t-2St / 0 0 0 -1\ 

0 -2 -3 0 (2) 

t \-4 0 0 -1-1/ 

where St indicates the sampling period. A mask denotes a dynamic relation- 
ship among qualitative variables. The negative elements represents the causal 
relations between he inputs and the output (positive value in the mask). The 
sequence in which they are enumerated is immaterial. In position notation the 
mask of equation 2 can be written as (4, 6, 7, 9, 12), enumerating the mask from 
top to bottom and from left to write. 

A quality value, based on an entropy reduction measure, is computed for 
each mask considered. In [4] the quality function is described in detail. The 
mask with highest quality is called the optimal mask. It is important to note 
that the optimality of the mask is evaluated with respect to the identification 
(training) data set. Therefore, the best mask is not, necessarily, the one that 
achieves the best forecast of the test data. 

In this study the quality function that evaluates the information associated 
to the mask is used as the cost function. In that way, no prediction is needed 
in the partition evaluation process. Therefore, only the fuzzification and the 
model identification processes of FIR methodology (see figure 1) are executed to 
compute the cost function for a specific partition. This reduces considerably the 
execution time of the simulated annealing algorithm proposed. 

Cooling Schedule 

Let us now take a look to all the parameters that conform the cooling sched- 
ule. The initial temperature depends on the initial solution generated and it is 
computed using equation 3, 



To = 



- ln{<P) 



■ Cost{So) 



( 3 ) 
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where Cost{So) evaluates the cost function of the initial solution (So) and € 
[0, 1] and their values depend on the number of variables of the application as 
described in equation 4. 

^ = 0.3,^> = 0,3 t/ iV < 3 /x = 0.1,^ = 0,l t/iV>3 (4) 

Equation 4 says that initially it is possible to accept solutions /r per one worse 
than the initial solution with a probability <1>. 

Two different cooling functions are predominantly used, i.e. linear and pro- 
portional. In this work, the proportional cooling function proposed by Kirpatrick 
[7] is used to decrement the temperature between successive stages. This function 
is presented in equation 5. 

Tfc_|_i = a • Tfe with a = 0.9 (5) 

The number of transitions needed to reach the quasi-equilibrium for each tem- 
perature is defined by means of two values, the maximum number of transitions 
i.e. iterations in the inner loop and the maximum number of accepted solutions. 
The maximum number of iterations is set to and the maximum number of 
accepted solutions is set to iV^, being N the number of system variables. 

Three stop criterions have been used in this study. The simulation anneal- 
ing algorithm stops when the number of iterations is grater than the maximum 
number of possible solutions (masxNC^), the last iteration has finished with 
no accepted solutions and/or N iterations have been completed without an en- 
hancement of the global solution, i.e. the best solution is not changed during 
the last N iterations. It is important to remark here that if the algorithm stops 
due to the first criterion no advantage is obtained with respect to an exhaustive 
search. Moreover, the annealing algorithm do not guarantee that the optimal 
solution is found. The main algorithm is presented next. 

function [BestSol] = Annealing (N,maxNC,data) 

*/o A first solution (Current Solution) and an initial temperature (T) are 
*/o generated. The evaluation of the cost function for the initial partition 
7o is also computed and stored in the CurrentSol structure 
[CurrentSol ,T] = GenerateCurrentSol(N,maxNC,data) ; 



7o The Current Solution is the Best Solution so far 
BestSol = CurrentSol; 

7o The Current Solution is stored in the list of generated solutions 
SolList = [CurrentSol] ; 

7o The total number of solutions generated is set to one 
NumberSolutions = 1; 

7o Initialization of both the number of iterations without a global 
7o enhancement and the number of iterations without accepted solutions. 
IterNoGlobalEnhance = 0; 

IterNoAcceptedSol = 0; % booleain variable 
7o Loop that sets the temperature 

while (NumberSolutions <= (maxNC'N)) & (IterNoGlobalEnhance < N) 

& ("IterNoAcceptedSol), 
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7« Initialization of the boolean variable that establishes if a global 
% enhancement has been produced, the number of accepted solutions and 
7« the number of iterations for the current temperature 
GlobalEnhance = 0; 7» boolean variable 
NumAcceptSol = 0; 

Numiter - 0; 



7« Loop that runs the Metropolis Monte Carlo simulation 
while (Numiter < N^S) & (NumAcceptSol < N~2) , 



y. The number of iterations is incremented 
Numiter - Numiter + 1; 



7o The GenerateCandidateSol function generates the Candidate Solution, 
y. If this solution is not in the list of generated solutions, evaluates 
y» its cost function and includes both values in the solution list. If the 
% solution is already in the list (it has been generated one or more times 
y, in the past) , the cost function is available aind it is not computed again. 
[CcindidateSol , SolList] = GenerateCandidateSoKN ,maxNC,CurrentSol ,SolList ,data) ; 



% The total number of solutions generated is incremented 
NumberSolutions = Number Solutions + 1; 



% The difference between the cost function of the Ccindidate Solution and 
y, the cost function of the Current Solution is stored in the Delta variable 
Delta = CcindidateSol . cost - CurrentSol . cost ; 



% Condition for the acceptance of the Candidate Solution 
if (rand(l) < exp(- Delta/T)) I (Delta < 0) 



% When accepted, the Candidate Solution becomes the Current Solution 
CurrentSol = CcindidateSol; 

“/o The number of accepted solutions is incremented 
NumAcceptSol = NumAcceptSol + 1; 

“/o If the Current Solution has a lower cost function value than 
% the one of the Best Solution, this one is actualized 
if (CurrentSol . cost < BestSol . cost) 

BestSol = CurrentSol; 

GlobalEnhance = 1; % booleain variable 
end; 
end; 
end; 

7« The temperature is decremented 
T = alpha*T; 



y« The IterNoAcceptedSol and IterNoGlobalEnhance variables are actualized 
7« once the quasi-equilibrium is reached for the current temperature 
IterNoAcceptedSol = (NumSolAcept == 0) ; 
if GlobalEnhance 

IterNoGlobalEnhcince = 0; 

else IterNoGlobalEnhance = IterNoGlobalEnhcince + 1; 
end; 
end; 
return 



Fig. 2. Simulated Annealing algorithm for the automatic determination of fuzzy par- 
titions in FIR methodology (Implemented in Matlab 6.5 languaje) 
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3 Ozone Concentration 

The main air pollution problem that has been identified in Mexico city metropoli- 
tan area (MCMA) is the formation of photochemical smog, primarily ozone (O3). 
High levels of ozone causes eye irritation, respiratory disorders, crop damage and 
increased deterioration rate of material. In these circumstances, it is important 
and useful to provide early warnings of high levels of ozone concentration so 
that the authorities can react as fast as possible. Therefore, the construction of 
ozone models that capture the behavior of this gas in the atmosphere as pre- 
cisely as possible is of interest not only for environmental scientists but also 
for government agencies. There are many different models available for local 
scale predictions of air quality and for ozone level forecasting. In recent years 
paradigms such as neural networks [8] , decision trees or association rules [9] have 
been used for this purpose. In [6], FIR methodology has been used to model the 
ozone contaminant in the centre region of the Mexico city. Seven variables are 
involved in this study. The input variables are hour of day hd (from 0 to 23), 
day of week dw (from 1 to 7), wind speed ws, measured in meters per second 
(m/s), wind direction wd, measured in degrees (from 0° to 359°), temperature 
t, measured in °C and relative humidity hu, measured in percentage (from 0% 
to 100%). The ozone o3 (measured in parts per million (PPM)), is the system’s 
output variable. Ozone and weather data were available from January to May 
2000 and contain missing values. The data of the first four months is used as 
identification data set, whereas the month of May is used as test data set. The 
mean square error in percentage (MSE) is used to determine the validity of each 
of the models. 

In [6] three different partitions have been studied to find the model with 
the best prediction performance. The best optimal mask found, performing an 
exhaustive search, for the three partitions studied are presented in the first three 
rows of the table 1. 



Table 1. Partitions results obtained by the previous work (first three rows) and the 
Simulated Annealing algorithm (las three rows) 



Partition 

hd dw ws wd t hu o3 


Optimal Mask 


Quality 


MSE test 


(3, 3, 3, 3, 3, 3, 3) 


(1,14,21) 


0.595 


52.08% 


(5, 5, 4, 5, 4, 4, 4) 


(4,14,17,21) 


0.537 


145.7% 


(6, 6, 2, 3, 3, 4, 2) 


(1,14,21) 


0.738 


39.36% 


(3, 2, 6, 4, 6, 5, 2) 


(1,14,17,21) 


0.736 


34.90% 


(3, 3, 4, 2, 4, 2, 2) 


(1,14,17,21) 


0.763 


38.33% 


(3, 2, 5, 2, 3, 2, 2) 


(1,14,17,21) 


0.757 


38.45% 



In table 1, the first column contains the number of classes for each vari- 
able (partition). The second column describes the optimal mask associated to 
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that partition, in position notation. The third column contains the quality of 
the optimal mask, i.e. the cost function in the simulated annealing algorithm 
proposed. The last column contains de MSE prediction error of the test data 
set. The last 3 rows shows the partitions proposed by the simulated annealing 
algorithm when running it several times (more than 20). As can be observed 
from table 1 the partitions chosen by ’’hand” in the previous work have lower 
quality and performance than the partitions suggested by the simulated anneal- 
ing algorithm. The partitions recommended by the SA algorithm have similar 
qualities and performances, and any of them can be used as a good partition 
parameter in the fuzzification process of FIR methodology. To chose a partition 
without previous criterion is a big risk that the modeler can avoid by using the 
SA algorithm presented. Clearly, the SA algorithm is a very useful tool that 
allows the modeler start using FIR in a more efficient way. 

4 Conclusions 

In this paper, a simulated annealing algorithm for the automatic tuning of fuzzy 
partitions in the context of the fuzzy inductive reasoning methodology has been 
presented. The SA algorithm suggests, for each variable, the number of classes 
to be discretized, basing its decision on the quality of the best mask associated 
to that partition. The use of the SA algorithm for the modeling of the ozone 
contaminant in Mexico city shows the potentiality of this approach. 
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Abstract. Morphological associative memories (MAMs) use a lattice 
algebra approach to store and recall pattern associations. The lattice 
matrix operations endow MAMs with properties that are completely dif- 
ferent than those of traditional associative memory models. In the present 
paper, we focus our attention to morphological bidirectional associative 
memories (MBAMs) capable of storing and recalling non-boolean pat- 
terns degraded by random noise. The notions of morphological strong 
independence (MSI), minimal representations, and kernels are extended 
to provide the foundation of bidirectional recall when dealing with noisy 
inputs. For arbitrary pattern associations, we present a practical solution 
to compute kernels in MBAMs by induced MSI. 



1 Introduction 

The foundation of morphological associative memories was established in [9], 
where it was proved that morphological auto-associative memories have unlim- 
ited storage capacity and provide perfect recall for noncorrupted boolean inputs 
in comparison with traditional associative memories based on correlation encod- 
ing such as the classical Hop field auto-associative memories [3,6]. Gorrelation 
encoding requires that the key vectors are orthogonal in order to exhibit perfect 
recall of the fundamental associations [1,4]. The morphological auto-associative 
memory does not restrict the domain of they key vectors in any way. Thus, as 
many associations as desired can be encoded into the memory; one step con- 
vergence and perfect recall of boolean noisy patterns using the idea of kernels 
were also settled [9]. Furthermore, the theoretical framework for morphologi- 
cal bidirectional associative memories, developed in [10], showed again, that for 
some binary pattern classes, MBAMs have large storage capacity and superior 
bidirectional recall than traditional BAM models [5] and also competitive with 
other feedforward BAM networks [15]. A characterization of kernel vectors for 
binary patterns that provided for a direct method for kernel computation as 
well as bounds for the allowable amount of corruption of the exemplar patterns 
that guarantee perfect recall appeared in [13]. An additional development that 
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uses the notion of dual kernels to enhance the error correction capability of bi- 
nary auto-associative morphological memories has been introduced in [14]. By 
redefining the notion of kernels, together with new concepts such as morpholog- 
ical strong independence and minimal representations of exemplar non-hoolean 
patterns, MAMs were shown to be robust in the presence of noise [11,12]. 

Our work is organized as follows: Section 2 gives a brief background of lattice 
matrix operations for dealing with MAMs, and Section 3 provides an overview 
of the main results obtained from previous research on MAMs, where greyscale 
image pattern associations are used to illustrate their performance. Section 4 
presents the known theoretical results related to MB AMs for non-boolean pat- 
terns including the kernel methodology for storing and recalling associations 
based on the notions of morphological strong independence (MSI) and minimal 
representations. Section 5 presents a new procedure for the computation of ker- 
nels by induced MSI. Finally, in Section 6 we give our conclusion to the research 
presented here. 

2 Lattice Matrix Algebra 

Lattice matrix operations are defined componentwise using the binary operations 
of the bounded lattice-group algebraic structure of IR±oo = KU {— oo, -|-oo} [2,8]. 
The binary operators for the maximum or minimum of two numbers are denoted 
with the “join” and “meet” symbols employed in lattice theory, i.e., x V y = 
max(x,y) and x Ay = min(a;, 2 /). For example, the maximum of two matrices 
A, B of the same size mxn is defined as (AV B)ij = aij V 6^ , for alH = 1, . . . , m 
and j = 1,... , n. Inequalities between matrices are also verified elementwise, 
e. g., A < B if and only if a^- < 6^. On the other hand, the conjugate matrix A* 
is defined as —A* where A* denotes usual matrix transposition, or equivalently, 
{A*)ij = a*j, hence (A V B)* = A* A B* . In addition, for appropriately sized 
matrices A, B, the ijth entry of the max-sum and the min-sum of A and B, is 
defined respectively, for all i = 1, . . . , m and j = 1, . . . , n, as follows 



where, e. g., Ak=i minimum of the set of numbers {oi, . . . , Op}. 

The relationship (AEZlB)* = B*\NA* holds for any A,B, and establishes the 
duality between both types of lattice matrix sums. Finally, the morphological 
outer sum of two vectors x £ M" and y £ M™, is given by the mxn matrix 
(note that y (B x* = yMx* = yUSix*) 



p 



p 



{AmB),j = \J {a^k + bkj) and (AElB)y = f\ {aik + bkj) , (1) 




( 2 ) 



Kernel Computation in Morphological Bidirectional Associative Memories 



565 



3 Morphological Associative Memories 

For a given set of pattern associations G K" x K™ : ^ = 1, . . . ,k} we 

define a pair of associated pattern matrices {X, Y), where X = , x^) and 

Y = (y^, . . . , y^). Thus, X is of dimension nxk with i, jth entry xl and Y is of 
dimension mxk with i,jth entry yf. To store k vector pairs (x^, y^), . . . , (x^, y^) 
in an m X n MAM we use the morphological outer sum as follows [9] . The min- 
memory Wxy and the max-memory Mxyj that store a set of pattern associations 
{X, Y) are given, respectively, by the expressions 

k k 

Wxy = Y\RX* = f\[y^ ® or Wij = f \ (yf - x|) , (3) 

5=1 5=1 

k k 

Mxy = TraX* = Y [y^ © (-x^)*] or mij = \J {y^ - x|) . (4) 

5=1 5=1 

We speak of a hetero- associative morphological memory (HMM) ii X ^ Y and 
an auto-associative morphological memory (AMM) if X = Y. From (2), for each 
y^ X {—x^y is a matrix of size m x n that memorizes the association pair 
(x?, y^), hence Wxy = A 5=i and Mxy — V5=i which suggests the given 
names. We use Wij and mij as an alternative notation for the ijth entries of Wxy 
and Mxy if there is no confusion about which association is under discussion. 
Since, M*^ = (AEaF*)* = FeX* = Wxy and W^x = (^ElF*)* = FElX* = 
Mxy, the retrieval of pattern y^ from pattern x^ can be expressed using the 
direct memory schemes (the vertical bar means “or”), 

^ {Wxy I Mxy} ^ , (5) 

where either one of Wxy or Mxy or their corresponding duals may be used. In a 
similar fashion, Wyx = Mxy and Myx = Wxy, hence recalling the pattern x^ 
from pattern y^ can be realized using the conjugate or reverse memory schemes, 

y^ -)> (Wyx | Myx} . (6) 

The conditions of perfect recall for perfect input were established in [9] and we 
repeat them here for convenience. Specifically, WxyGZIX = F or Mxy® A = Y, 
if and only if, for each row index i G ,m} and each pattern index 7 G 

{!,... , k}, there exists an index j G {1, ■■■ ,n} which depends on i, 7, such that 

Vl -^] = O'' Hi ~^] = ■ C^) 

It is important to remark that, the conditions for perfect recall using MAMs 
may not be satisfied for arbitrary association pairs (X, F), with X F, that 
arise in most practical applications. However, even in the case that for each 
pattern x^ several row indexes do exist for which the expressions in (7) are 
not satisfied, the memories Wxy and Mxy still provide a storing mechanism 
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with almost perfect recall, in the sense of a suitable distance measure, between 
the original pattern and the recalled pattern yf. In particular, we use the 
normalized mean square error (NMSE), denoted by e(y|,y|), to quantify the 
difference between and yf, when recalling stored patterns by means of a 
specific hetero-associative memory scheme. The following example involving non- 
boolean patterns of high dimensionality illustrates our claim. 

Consider the five pattern image associations (p^, q ^), . . . , (p®, q^) shown in 
Fig. (1). Each individual pattern or q^ is a 50 x 50 pixels 256-gray scale 
image. For uncorrupted input, almost perfect recall is obtained if we use ei- 
ther of the memory schemes given by (5) or (6). Using the standard row-scan 
method, each pattern image, e.g., can be converted into a pattern vec- 
tor = (4,... , 450 o) S of X by defining, = p^(r,c) for 

r, c = 1,... ,50 (pattern vector q^ is similarly defined for y^ of Y). Figure 2 
shows the results when applying the memory scheme of (5) using the canonical 
memories Wxy and Mxy- A visual inspection does not reveal immediately the 
hidden differences that cause the recall to be non-perfect since e(y|,y|) « 10“^ 
for ^ = 1, ... ,5. Although, for a given arbitrary set {X^Y) of pattern associa- 
tions, the HMMs, Wxy (or Wyx) and Mxy (or Myx) are not necessarily perfect 
recall memories, they still can be applied successfully to deal with noisy inputs. 




Fig. 1. The association {X, Y) that was used in constructing the memories Wxy and 
Mxy (of size 2500 x 2500). First row: patterns of X-, second row: patterns of Y 



4 MBAMs and the Kernel Method 

From [10], the conjugate morphological memories, Myx = Wj^y and Wyx = 
M^yj also denoted by W* and M*, perform the feedback scheme for bidirectional 
recall in a MBAM. The basic association mechanisms for perfect input in the X 
to Y direction are given by the following one-step procedure without thresholding 



— >• {Wxy I Mxy} — y^ — >■ (Myx | Wyx} — f x^ . 



( 8 ) 
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Fig. 2. The hrst row displays the associated Y patterns as recalled by the memory 
Wxy; the second row displays the associated Y patterns as recalled by the memory 
Mxy 



Again, conditions for perfect recall in MBAMs must satisfy relations similar 
to (7) (for MAMs) in both the forward and feedback paths for the memories 
involved. Even if perfect recall can not be accomplished, MBAMs allow for heavy 
overlap of features as was demonstrated in [10] using boolean patterns. 

We now turn our attention to noisy patterns. Let I = {1, . . . , n}, a distorted 
version x'^ of the pattern x~^ has undergone an erosive change whenever x^ < x'^ 
or equivalently if Wi € I, x] < x] . A dilative change occurs whenever x^ > x'^ 
or equivalently if Vi € I, x] > x] . Let L,G C I he two non-empty disjoint sets 
of indexes; if Vi G L, x] < xj and Vi G G, x] > x] , then the distorted pattern 
x'^ is said to contain random noise. In order to deal efficiently with corrupted 
versions of exemplar patterns, the kernel method has proven to be useful in the 
binary case for MAMs [9,13] and MBAMs [10]. Here, we will extend the kernel 
technique in MBAMs to store and recall non-boolean pattern associations. 

The underlying idea of the kernel technique is to define a memory M which 
associates with each input pattern x"' an intermediate eroded pattern called 
the kernel pattern. Another associative memory W is defined which associates 
each kernel pattern z'^ with the desired output pattern . In terms of min-max 
sums, one obtains the equation, W BZl (MKIa;''') = y'* . The combination of the two 
morphological memories M and W is what motivated the following definitions 
and results (proved in [12]); for application purposes we assume that pattern 
features are non-negative^ i.e., xJ >Q for all qy. 

Definition 4.1. Let Z = (z^, . . . , z^) be an n x fc matrix. We say that Z is a 
kernel for {X, Y) with X , \i and only \i Z ^ X and there exists a memory 
W such that W M{Miz\Rx'^) = . 

Definition 4.2. A set of patterns Z < X \s said to be a minimal representation 
of X if and only if for 7 = 1, . . . ,k, z~^ t\ z^ = 0 V^ 7^ 7i contains at most 
one non-zero entry, and WzxE3 = x'^ . 

Definition 4.3. A set of pattern vectors X is said to be morphologically strongly 
independent (MSI) if and only if, Vf ^ 7, the next two conditions are satisfied: 
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1. V7 G {1, . . . , k}, ^ 

2. V7 G {1, . . . , k}, there 3 j-y G {1, . . . , n} such that 

xj-x\< x]^ - Vi G {1, . . . ,n} . (9) 

Theorem 4.1. If X is morphologically strongly independent, then there exists 
a set of patterns Z < X with the property that for 7 G {1, • ■ • ,k} 

1. yf 7, z'*' A 2:^ = 0 

2. z'^ contains at most one non-zero entry, and 

3. = x'<. 

Corollary 4.1. If X and Z are as in Theorem 4.1, then Z is a minimal repre- 
sentation of X . 

Corollary 4.2. If X and Z are as in Theorem 4.1 and Wxy is a perfect asso- 
ciative recall memory, then Z is a kernel for {X,Y) with W = WxyE 2 Ilxx- 

It is clear, from Corollary 4.2, that the recall mechanism in MBAMs is given by 
the following feed-forward network 

x^ Mzz ~^W ^ Mvv -^W ^x^ , (10) 

where, W = WxyEZI Wxx, W = WyxEZIWVy, and F is a kernel for (Y,X). 
The conditions that, WxxEZlz^ = x"^ and Wyy'3\v^ = y'>' are crucial for 
the recall capability of the memory scheme of (10) when presented with noisy 
inputs. Given a pair of minimal representations Z, V which are also kernels, 
respectively, for {X,Y) and (Y,X), and a noisy version (&'*', y''') of the pat- 
tern association {x^ ,y^) having the property that {z'^ ,v^) < [aP ^y^) and 
(MzzEli^, Mvv Aly^) < {xP then it must follow that 

WxxES (MzzEl*'^) = a:'*' and Wyy® (MvvKly^) = y'^ . (11) 

Although the performance of the proposed feed-forward MBAM network when 
presented with noisy inputs can not be assured in a completely deterministic 
way, for any set {X, Y) of k associated patterns in IR" x IR™, the expectation of 
recall capability is enhanced if min(n,m) ^ 0 and k <?; min(n,m). 

5 Computation of Kernels 

From a theoretical point of view. Theorem 4.1 and its corollaries provide the 
foundation for the kernel method when applied to perfect inputs. In addition, the 
combined memory scheme suggested by (10) together with the kernel association 
shown in (11) provide a useful mechanism for bidirectional pattern recall of noisy 
inputs. On the other hand, it is clear that the condition of morphological strong 
independence of the sets X and Y will be rarely satisfied in practical situations 
and seems to be very restrictive in its possible applications. A practical solution 
to this dilemma is given by the following procedure. 
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Algorithm 5.1. [MBAM kernels by induced MSI.] 

Step 1. Compute the global maximum U , and the global minimum L of the in- 
put set X that has k patterns of dimension n, i.e., U = max(A) = Vr=i 

and L = min(A) = Ar=i 4 - 

Step 2. Let I = {1, . • . , n}. For ^ = 1, . . . ,k, compute an index G I where 
the first available maximum value occurs, i.e., let = Vie/ ’ 1 = 1 — {*j}j 
^ ^ -I- 1, and recompute for the new pattern hence V 7 7 ^ i-y ^ i^. 

Step 3. Change the original pattern set A, at all positions for ^ = 1, . . . , fc 
with the U and L values determined in Step 1. Specifically, set = 17 if 7 = ^ 

otherwise set it to L. It turns out, that the modified pattern set, denoted by X, 
is a morphologically strongly independent set. 

Step 4. Apply to X the kernel method and the morphological memory scheme 
as described in Section 4. The kernel Z of A is readily obtained from Step 3, by 
defining for * = !,... , n and ^ = 1, . . . , fe, = 17 if z = otherwise set it to 0. 

Step 5. Repeat Steps 1-4 for set Y to find the kernel V of Y. In this final 
step, a two way kernel (Y,V) has been determined for (A, A). 

To complete the description of Algorithm 5.1, we next prove that set A 
(similarly for A) satisfies both conditions for MSI of Definition 4.3: 

1. V 7 G {1, . . . , fc}, x'’' ^ x^ for all ^ yf 7 ; that is, an index G {1, . . . , n} 

exists such that xj > x| for all ^ 7 . That this is true, follows immediately 

from the assignment made in Step 3 by making the choice, jj = ij. 

2. V 7 G {1, . . . , k}, there is an index jj G {1, . . . , n} such that, Vz G {1, . . . , n}, 
xj^ — x|^ xJ — x\] take again, = i^, therefore 

u-L = vr=i(4 - 4 ) ^ 4-4 ° 

Essentially, the kernel computation suggested in Algorithm 5.1, introduces an 
alternative MBAM scheme that substitutes (10), as follows, 

— >• — >• Mzz — >■ ^xx ^icY 

a:« ^ Wyx ^ ^ ^vv ^ 4 ^ 4 , ( 12 ) 

where, the recollection mechanism is based on the modified pattern sets A, A 
rather than the original A, A sets. Observe that induced MSI introduces a neg- 
ligible amount of deterministic “artificial noise” to the original patterns which 
does not affect the MBAM performance if min(n,m) ^ 0 and k <C min(n,m). 
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6 Conclusion 

Several key steps have been achieved for enhancing the recall capability as well 
as the error correction rate of morphological associative memories in the case 
of boolean patterns. However, the kernel technique in the non-boolean case has 
required more elaborate concepts to deal effectively with corrupted inputs. Mor- 
phological strong independence is a sufficient condition for building minimal 
representations and kernels; however, in most practical applications, the associ- 
ation pattern sets may not satisfy this requirement. Therefore, at the expense of 
reducing storage capacity, the induced MSI procedure presented here is useful 
for generating a two way kernel in a MBAM that is quite robust to random noise 
for arbitrary non-boolean associations. 
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Abstract. Archiving of image data often requires a suitable data reduc- 
tion to minimise the memory requirements. However, these compression 
procedures entail compression artefacts, which make machine processing 
of the captured documents more difficult and reduce subjective image 
quality for the human viewer. A method is presented which can reduce 
the occurring compression artefacts. The corrected image yields as out- 
put of an auto-associative memory that is controlled by a Self-Organising 
Map (SOM). 



1 Introduction 

Standard image compression algorithms do not consider the image content for 
the selection of compression parameters. Often, the user has to do experiments 
with the compression parameters until his requirements are met. We propose a 
system for the correction of compression artefacts based on an associative mem- 
ory that is improved by a Self-Organizing Map (SOM) controlling its parameters 
(Fig. 1). 




Fig. 1. SOM-controlled associative memory 

The JPEG procedure (Joint Photographic Expert Group) [2,3] is typical 
for lossy image compression and an inherent part of image capturing devices 
such as scanners or digital cameras often implemented in integrated circuits. 
The classical block-based JPEG procedure formed therefore the basis of our 
fundamental investigations, but the main ideas of our approach can be applied 
also to the wavelet-based methods of JPEG 2000. The examples are on the basis 
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of gray-value images in order to simplify the explanations. The same method can 
be applied to each component of color images, of course. Consideration of the 
coupling of the basis colors can improve the results but is not further explained 
in this paper. 

Different approaches exist to describe image degradation [11]. Often, the de- 
viation of the considered and the error-free reference image is assessed by the 
sum or mean of error squares (or the square root of it: root mean square, RMS) 
as a quantitative measure. Also the signal-to-noise ratio (SNR) and the mean 
absolute difference (MAD) are important criterions. The integrating property of 
these criteria is a serious drawback and the subjective evaluation of an image of- 
ten deviates from such simple measures. Subjective assessment of image quality 
depends strongly on the assessing group of persons [11]. In order to reduce the 
compression artefacts generated by JPEG compression, different solutions have 
been examined. The quantitative error measure for all investigated approaches 
has been the sum of error squares. This avoids expensive interviews for a subjec- 
tive evaluation and the results are easier to be compared. In most cases, a lower 
quadratic error leads to a better subjective image quality. 

An important assumption in our approach is that images of a certain im- 
age content can be assigned to classes of images. The variability of an image 
is limited in many typical cases such as text documents, cheque forms, traffic 
scenes, nature images etc. Internal coherences exist in the images of an image 
class, which should be generally exploited for redundancy reduction. This way, 
the compression procedure is optimised depending on the particular image class. 
The application of a procedure with well defined parameters for a particular 
image class to a different (dissimilar) image class can therefore lead to poor re- 
sults. An image-class-dependent correction seems to be a promising approach to 
compensate the loss of information caused by image compression. The missing 
information on the coherences within an image class is recalled from an associa- 
tive memory after the decompression. This way, the compression artefacts are 
to be reduced. 

2 An Auto-associative Recall for Compressed Image 
Blocks 

At first we describe the function of the associative memory. It is applied to the 
degraded archive images of a certain class. To consider calculation resources and 
the typical structure of JPEG we have designed associative memories according 
to image blocks of 8 by 8 pixels. For the A:-th block of an image yields 



Ok = 


blNi,k blNg,k ■ 
blN2,k blNio,k ■ 


■ blNs7,k 

■ ^INs»,k 


; Rk = 


bjPGi,k bjpG9,k ■ 
bjPG2,k bjPGio.k ■ 


^JPG57,k 
■ ^JPG5s,k 




blNg^k blNie,k ■ 


■ blNs4,k . 




bjPGg,k ^JPGis.k ■ 


■ bjPG9i,k . 



where Ok is the original image block and Rk the resulting block after compres- 
sion/decompression with biNi...ei,k bjpci, ,ei^k being the 8 + 8 gray values of 
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the fc-th image blocks. The gray values are numbered column-wise obtaining the 
vectorized original and compressed/decompressed images, respectively: 



Bin = 


biNi,i 

biN2,i 


■ biNi^k ■■ biNi,K 

■ blN2,k ■■ blN2,K 


; Bjpa = 


bjPGi,i ■■ bjpGi^k 
bjPG2,i ■■ bjpG2,k 


• bjPGi,k 
■ bjPG2.K 




blNei,! 


\ 




bjPGei.i 


■ PGb4,K J 



^ ’ ( 2 ) 

These K blocks can originate from the same image or from different images but 
their contents should possess similar properties to fulfil the desired requirement 
to represent a certain image class. The selection of suitable sample blocks pre- 
senting prototypes of the defined image class is crucial for a good correction. 
Different strategies to determine the sample blocks have been therefore investi- 
gated. For the description of the associative memory approach, we will now use 
Fig. 2. 



a) 

b) 



c) 




Fig. 2. Associative memory for the correction of compression artefacts 



Investigations on real image data showed that the relationships between an 
original and a decompressed image can be approximately described by a set of lin- 
ear equations. This holds, of course, especially for image classes that are charac- 
terized by certain statistic properties. The process of compression/decompression 
(CODEC) can therefore be approximated by a system matrix U for a mathe- 
matically simple description (see Fig. 2a). Assuming an approximately linear 
transfer behaviour of the CODEC, the elements of U can be determined by 
minimisation of the error squares for a test data set. The number of samples 
must be greater than the number of pixels in the image block. This corresponds 
to the determination of the pseudo inverse [8,4]: 

U = BjpqBim^ . (3) 

We are looking for a correcting system C that compensates the influences 
caused by U as far as possible. This means that our aim is to approximate Bjf^ 
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by JB/at. Such systems can be considered as auto-associative memories. The 
investigation of the eigenvalues of the transfer matrix U resulted in a certain 
number of eigenvalues with a value zero or close to zero. Here, a direct relation- 
ship to the lossy data compression exists where higher frequency components 
are also often weighted less (reduced code length of JPEG for coefficients of 
higher frequency components). More eigenvalues of U are close to zero with in- 
creasing compression rate (lower quality). The relation between number of zero 
values at higher frequencies and the compression rate is almost linear. This ob- 
served behaviour motivates a the scheme of classical auto-associative memory 
[1] according to Fig. 2b that is abel to store a-priori knowledge on the image 
contents. The determination of # for a certain data set can be considered as a 
training that approximates Bin by Bjn (auto-association). The training of the 
auto-associative system can be expressed by 

Bin = Bin Bin (4) 

with a residual error caused by the interpolating behaviour and 

Bin = (5) 

with being the u-t\i line of # and Qi, the v-t\i dimension-reduced signal. The 
mean square error between Bin and Bin has to be minimized for each length of 
vector a. An eigenvalue problem results when considering the data of an image 
class (assuming the number of samples in the data set of this image class is much 
greater than one) as follows: 



BinBin^^ = A.^. ( 6 ) 

The transformation matrix # is calculated by the solution of the eigenvalue 
problem [4,6]. The matrix A contains the eigenvalues of Bin Bin"'" in the main 
diagonal. This dimension reduction is crucial for the overall system because it 
stands for the inherent a-priori knowledge that is stored in #.The corrected image 
is recalled from the left part of the associative memory (b) by the Karhunen 
Loeve coefficients a. 

Because the CODEC produces image data a transfer matrix T producing 
vector a (Fig. 2b) is separated from The behaviour of the eigenvalues of 
U (increasing compression — >■ fewer eigenvalues 0) leads to the conclusion 
that the first part of the auto-associative memory (Fig. 2b, can be considered 
as forward transform) models the process of compression/decompression if the 
length of vector a is limited in (Fig. 2c). The same holds for the correcting 
system C consisting of T and #. 

Bjpg = UBin models the compression/decompression approximation by 
the associative memory and from Fig. 2c follows Bin = ^TBjpq- This leads to 
T = , with U~^ being the Moore-Penrose inverse [4] and to the resulting 

correction matrix 



C = T^= «? . 



( 7 ) 
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3 An SOM for Image Block Classification 

A further improvement of the results of correction can be expected if special 
transformation matrices ^ and T are determined according to each single image 
class. An automatic and adaptive classification is desirable. Artificial neural 
networks are well known and suitable for classification tasks. 

Because a direct learning target for the image blocks and the centre of gravity 
of the classes is hard to define, a SOM seems to be appropriate for this appli- 
cation. The SOM was suggested and developed by Kohonen in the early ’80s [5] 
and was then established as powerful and adaptive tool for clustering and visu- 
alization [10]. It belongs to the group of unsupervised trained artificial neural 
networks with a close relationship to biological signal processing [7]. Due to its 
inherent properties and biological origin the SOM seems to be predestined to be 
implemented in the complex image processing system described in this paper. 



neuron with neighbourhood representing an image class 

o’o 

o o • • 

weight vectM O O O • O Ashonen layer 

O OO A 




input layer 



Fig. 3. Scheme of the SOM for the creation of image block classes 



It performs an ordered non-linear mapping of high-dimensional input data 
onto a usually two-dimensional rectangular grid of neurons. This reduction of 
the dimensionality is characterised by an abstraction to important properties 
contained in the input data. At the same time insignificant information is reduced 
or even removed. The SOM generally allocates more neurons for inputs that 
occur more frequently in the input space (magnification factor). This improves 
the local resolution in these areas. This way the resources, in other words the 
usage of neurons available for the input data representation, are optimised during 
the training phase of the network. This is very similar to biological brain maps 
where for instance in the visual cortexa a larger cortical area is allocated for 
frequently presented observations. 

Due to its topology preservation, the SOM, unlike many other clustering al- 
gorithms, keeps similarities of the input data by transforming these into neigh- 
bourhood relations of the organized map (Fig. 3). In this organzing phase each 
neuron matures to a prototype for a particular (sub-)cluster by adapting its 
own properties (weight vector) to those of a group of similar or identical input 
patterns (input vectors) by using a simple similarity criterion. A neighbourhood 
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Fig. 4. SOM-controlled correction of the compression artefacts 




Fig. 5. Error reduction on the basis of example images without SOM 



function works in such a way that neurons situated close to each other on the 
map are representing similar properties. In our case the input layer of the SOM 
is provided by the pixels of the image block. For each input one of the M * N 
output neurons - the winner neuron - defines the represented image class. Fig. 4 
shows the interaction of the SOM with the CODEC providing its input and the 
parameter sets Cs of the auto-associative memory to be activated by the win- 
ning neuron. For each block class, 1 ■ ■ ■ s ■ ■ ■ M * N, a parameter set for the 
associative memory is calculated by solving the eigenvalue problems and similar 
to the former considerations (see Eq. 2) yields for each block class s: 

Cs = (Us-^^s^) ( 8 ) 



4 Results 

On the basis of three example images the results of the reduction of the compres- 
sion errors (method of section 2, without SOM) are represented in the following 
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Fig. 5. The compression error can be reduced by approximately 20 % depending 
upon the image content and the quality factor or the compression rate can be 
approximately increased by 25 % without rising up the compression error. 

The proposed procedure for the reduction of the compression artefacts has 
positive effects on further automatic processing of the images. In order to obtain 
further improvements for the human viewer, other criteria must be additionally 
included in the selected error measure. The positive effect of reduced compression 
errors for further automatic processing can be demonstrated by the example 
of the text recognition. The recognition error can be reduced up to 48 % at 
higher compression rates (quality factor less than 10). Typical image classes are 
classified by the SOM. Block-dependent image classes are obtained and the above 
described method can be applied with varying parameter sets to the associative 
memory. 




D^e I block Classes 



Fig. 6. Distribution of the image blocks 
blocks (right) ordered by sample images 




block classes 

the training of a 4x4 SOM (left) for all 




0,1 
0 

JPEG 



without 

SOM 



SOM 
with 9 
neurons 



SOM 
with 16 
neurons 



Image C 
Image B 
Image A 



Fig. 7. Compression error without correction, with correction by associative memory 
bnt withont SOM and correction error with associative memory controlled by SOM of 
different sizes, applied to the sample images 
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Fig. 6 shows a training result of the SOM for the example images of Fig. 4. 
The classifying neurons are well distributed. This is an indication th at the SOM 
is of reasonable size. 

Fig. 7 shows the gain of quality by introducing the classifying SOM for the 
correcting system. With the varying parameter set for the associative memory, 
the error reduces by about 20 • • • 30% for the gray tone images and up to 90 
% for the text image. For simplification of the complex diagram, only relative 
compressions errors are indicated. 

5 Conclusion 

A system for compensation of compression errors by an associative memory 
has been presented. The performance has been increased considering the image 
(or block) content introducing a neural network based classification system - a 
SOM. The SOM enables us to apply the same correcting system to different types 
of image and also to images with strongly varying image content (for example 
mixture of text and photographs) . The number of different parameter sets needed 
for correction is equal to the number of neurons in the SOM or the number 
of introduced image classes. [9] proposes an SOM as a compression scheme as 
an alternative to vector quantization directly generating the codewords. This 
interesting approach for compression with an SOM could also be completed with 
an SOM controlled associative memory as proposed in this paper. Depending on 
the codeword the associative memory is recalled with different parameter sets. 
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Abstract. This work investigates a way to exploit the information of satellite 
images in order to identify cartographic features, aiming at developing a soft- 
ware tool able to update digital maps automatically. A cartographic feature, like 
any object present in a multi-channel image, is a set of pixels with similar 
spectral response and a certain spatial relation between them. The current algo- 
rithm works iteratively and mixes the spatial information with the spectral one 
in an appropriate way to finally detect the whole shape of a cartographic feature 
starting from a pixel marked previously by the user in a remotely sensed image. 
It is also shown that Mathematical Morphology (MM) operators can handle the 
spatial and spectral information decreasing the computational cost. First, the 
structure of the main algorithm is presented, showing each step of its opera- 
tional sequence. Then, some application examples are reported and, finally, 
some remarks illustrate the future possibilities of implementation and develop- 
ment of the algorithm. 



1 Introduction 

Objects in multi-channel images, particularly remotely sensed ones, are set of pixels 
that can be assembled together by their individual spectral response and by their 
neighborhood relationship in the spatial domain of each channel. Although there exist 
in the literature several works that propose to control the segmentation through spec- 
tral information [1] [2] [3], there is not a general procedure that combines both types 
of information. Thus, in this paper, we introduce a novel methodology, whose main 
objective consists of generating objects by assembling together small elementary re- 
gions that are simultaneously spectral and spatial neighbors. Usually the classification 
procedures are distinguished between boundary seeking and region growing [4] [5], 
whether the objects are defined detecting their edge pixels, or as regions in which the 
pixel values are homogeneous. The proposed algorithm firstly divides the image’s 
spatial domain in small areas and then applies a region growing process to the areas 
instead than directly to the pixels, so the spectral information is processed in two 
steps, on the pixel scale and on the elementary region scale. 

A. Sanfeliu and J. Ruiz-Shulcloper (Eds.): CIARP 2003, LNCS 2905, pp. 580-587, 2003. 
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The first partitioning is performed with a watershed [6] transformation on the spectral 
gradient [7] [8] [9], thus obtaining a fine segmentation that preserves the image spec- 
tral meaning. Therefore a unique and representative spectral value can be assigned to 
each area, producing a multispectral mosaic image [6] . On the latter a region growing 
procedure performs an iterative merging of the regions with similar spectral value, 
starting from the areas marked by the user. The user controls the similarity between 
regions by a spectral distance parameter previously defined. The region merging pro- 
cess is obviously much faster than a pixel merging one, even considering the compu- 
tational time of the preliminary watershed. Apart from this, objects created with a 
fixed shape kernel on each pixel usually have a border that reflects this geometry 
while this procedure maintains the details of the objects morphology as perceived by 
human vision. 



2 Fusion Algorithm 

This name recalls the merging which is the main operation of this methodology. This 
algorithm takes the pieces of the mosaic (the basins of the watershed) that are spatial 
neighbors, and merges them into together into a new one. 

This idea is meant for any n-dimensional multi-channel image but this article illus- 
trates it in the bi-dimensional case, on a simple set of two images. This simplification 
maintains the validity of the multi-spectral aspect and is convenient because it worked 
with less computing expense and it allows a graphical description of the process: in- 
deed we can see the projection of mosaic pieces as points in a two dimensional image, 
where their coordinates are their grey level in each band, moreover it allows a graphi- 
cal representation of the concepts of spatial and spectral proximity, which is difficult 
in a case of a three-dimensional set, and impossible with more dimensions. Finally the 
bi-dimensionality lets us run the whole algorithm, even when it approaches the multi- 
spectral data, in terms of MM transforms, as the distances calculations in the spectral 
space are performed by the binary dilation. 



2.1 Mosaic Images 

A quick remark on the basic support for this procedure has to be done before de- 
scribing in details the steps of the algorithm. The mosaic image to be used is slightly 
different from the ones found in the literature [6]. Generally a mosaic image is created 
from a grey level image once this is divided in areas and a unique grey value is as- 
signed to each one calculated on the values of its pixels. There are different mosaic 
versions that differ in the value assigned to object and to the boundary between the 
areas. Many versions [7] [12] have been used but the one presented here, has two 
main characteristics: first, the objects are obtained applying the watershed on the 
multi-spectral gradient, in order to get a detailed separation in catchment basins with 
homogeneous spectral meaning; second, the watershed line between the adjacent ba- 
sins has value zero, in order to keep them separated. The binary image with the parti- 
tion in small areas, viewed as “pieces of the mosaic”, is here called as “pre-mosaic” 
(Fig. lb). 
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a b c 



Fig. 1. Original image (a), catchment basins or “pre_mosaic” (b) and mosaic image (c). 



2.2 Steps of the Algorithm 
Inputs 

First, the set of images must be transformed into mosaics. In this case we have two 
mosaics that were selected from a set of five images. Then a marker image is created. 
It can be obtained manually, marking some points belonging to the requested object, 
or automatically, if at least part of the object can be detected in this way. Basically 
this image should indicate the pieces that belong “a priori” to the required object 
(Fig.2). 




a h c 



Fig. 2. Reconstmction (c) of the pre-mosaic image (b) by the marker image (a). 

Step 1: Reconstruction of a marker image in the pre mosaic image: only the pieces of 
the mosaic hit by the marker image (intersection not empty) are considered from the 
“pre-mosaic” (Fig. 2c). 

Step 2: The algorithm considers one by one all the pieces marked in the previous 
step, and for each one of them finds the pieces that are spatial neighbors (Fig. 3b). 

Step 3: The spectral values of a piece, and its spatial neighbors, are visualized as 
points in the spectral space (Fig. 4a, 4b). 
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Fig. 3. Spatial domain: single piece (a); spatial neighbors (b). 



Step 4: With an intersection between a dilation of the projection of the piece under 
evaluation and the projections of its spatial neighbors, only the projections of the spa- 
tial and spectral neighbors are selected. The size of the dilation is the key parameter 
of this process and it can be different in each channel depending on the contrast rela- 
tion between the object and the background grey values, which usually varies from 
one channel to the other. In this case (Fig. 4c) it would mean that the dilation could 
have different values in the horizontal and the vertical direction. 




abed 



Fig. 4. Spectral domain: projections of the single piece in fig. 3a (a); projection of its spatial 
neighbors (shown in fig. 3b) (b); dilation of the single piece projection (c); intersection to iden- 
tify the spectral neighbors that are already spatial neighbors (d). 



Step 5: Knowing the spectral coordinates of the projections of the spatial and spec- 
tral neighbors the algorithm returns to the initial images to identify them (Fig. 5a) and 
merges them with the initial piece into a new bigger one (Fig. 5b). 

Step 6; Each piece grows to a larger shape that is the result of the merging procedure. 
Gradually the new merged pieces modify the mosaics, and update the input images 
for the next loop of the application. The new pieces of the mosaic images have a grey 
value that is an average of the grey levels of the merged pieces, weighted on their ar- 
eas. 

These six steps are applied on each previously marked piece and the whole process is 
looped on all the pieces until the algorithm can find no more spectral neighbors. 
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Fig. 5. Spatial domain: identification of the spatial neighbors that are spectral neighbors (a); 
merging procedure (b). 



3 Case Studies 

The developed methodology and algorithms were applied to several remotely sensed 
images; in particular it has been experimented to detect cartographic features in SPOT 
satellite images over the area of Luanda (Angola). We present in this paper some ex- 
amples concerning different cartographic layers. The first one concerns the study of a 
river (Fig. 6a), which is correctly identified (Fig. 6b). The contours (Fig. 6c) and the 
medial axis (skeleton) (Fig.6d) of the river, superimposed to the initial image, testify 
the success of application of our approach. This example also gives a hint on how to 
use the algorithm results in updating maps, for example this would be the first step 
towards creating a hydrology GIS layer. 
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Fig. 6. Single channel of cut to object in the original image (a), algorithm detected shape (b), 
boundaries (c) and skeleton of the object (d). 

Using old cartography features as markers to new ones should theoretically work 
well. Indeed many human built cartographic features (roads, urban areas) tend to ex- 
pand over the years so the old are included in the recent objects but this procedure can 
present some drawbacks. For example in Fig. 7, because of its slight displacement due 
to the incorrect registration between the map and the satellite images, the old route 
could not be used as a marker and forced the use manual markers. 
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Fig. 7. Cut of suh-urban area with old cartography overlaid (a), user made markers (b), algo- 
rithm detected shape (c) and boundaries on single channel (d). 

The algorithm proved its efficiency in the case of the airport, another object consid- 
ered relevant as an object to detect. In Fig. 8 two input mosaic images are exposed to 
illustrate the different information between the channels of the remote sensed images 
and the way that shapes are highlighted and its gray levels simplified. The input chan- 
nels are the XS3 and XSl, corresponding to the Blue and the Red bands of the SPOT, 
and they cover almost all the territory occupied by the city of Luanda airport. The 
marker image is very simple; it has just 4 pixels marked, and has been created spe- 
cially to extract the shape of the airstrips. As shown in the figure these points were 
enough to reconstruct the whole shape. 




Fig. 8. Input mosaics (a and b), marker image (c) and resulting object (d). 










586 



M. Mengucci and F. Muge 



4 Conclusion and Future Researches 

This methodology belongs to the “region growing” procedures’ class but its distin- 
guishing feature is that the elementary regions to be merged are not pixels but pre- 
segmented “textural elementary units” [13]. The pieces of the mosaic images, ob- 
tained with MM operators, can be represented as points in the spectral features space. 
For this, the region growing procedure becomes much faster and the objects shape 
found suites better to their morphology as perceived by human vision. 

It gives good and useful results and it is actually a good tool to find connected shapes 
with a spectral meaning, when applied to remotely sensed images. It is not in its de- 
finitive version and it is still liable to be modified to improve its efficiency. The future 
improvements will be done towards a better interaction with users, and to test and 
compare different modifications to the structure described in 2.2. 

Apart from the marker image the algorithm has another aspect where the user interac- 
tion is quite important; the spectral distance (2.2 Step 4). This is the parameter that 
decides whether the projections of the pieces in the spectral space are neighbors or 
not. It is a distance parameter that has a component in each band. In the future modi- 
fications it will be improved the semi-automatic definition of these. The user will be 
requested to make some click on the object {object clicks) in the image and some in 
the background, in points that do not belong to the object “a priori” {background 
clicks). In this way the algorithm can be programmed to read the difference between 
the values of the clicked pixels in all the bands and calculate statistically the parame- 
ters of the spectral distance in the different channels. The whole process could be 
made fully automatic but the user interaction and understanding is always fundamen- 
tal. The application of this idea has been mainly dedicated to cartography but the al- 
gorithm is an object recognition tool that could work on any other multi-channel 
digital image. 
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Abstract. Morphological neural networks consider that the information 
entering a neuron is affected additively by a conductivity factor called synaptic 
weight. They also suppose that the input channels account with a saturation 
level mathematically modeled by a MAX or MIN operator. This, from a 
physiological point of view, appears closer to reality than the classical neural 
model, where the synaptic weight interacts with the input signal by means of a 
product; the input channel forms an average of the input signals. In this work 
we introduce some geometrical aspects of dendrite processing that easily allow 
visualizing the classification regions, providing also an intuitive perspective of 
the production and training of the net. 



1 Introduction 

Neural networks are today a computational alternative to solve problems where is 
difficult or does not exist an algorithmic solution. Inspired on the functioning of the 
nervous system, researchers have postulated different neural processing models. 

Recently, it has been found that information processing occurs also at dendrite 
level and not only at the neuron body [4]. This could be an explanation of the 
efficiency of the nervous system; due to the information processing practically occurs 
on the communication channel. This with morphological paradigm is starting point of 
this research. 

1.1 Outline of the Paper 

The remainder of the paper is organizes as follows. In Section 2, we briefly talk about 
the related work with the present research. In Section 3, we describe the adopted 
methodology to give a solution to the problem. In Section 4, we provide an example 
to explain the functioning of the proposed methodology. Finally, in Section 5, we 
conclude and give directions for future research. 
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2 State of the Art 

A neural network can be conceptualized as a non-linear mapping between two 
pattern-spaces: the input pattern and the output pattern. Normally the internal 
parameters of this mapping are determined by a training procedure and they are 
called, in most cases, synaptic weights. 

In the decade of the 50’s Rosenblatt [1] introduces the Perceptron. This classical 
model has served a basis of most of the actual developments. 



3 The Adopted Methodology 

3.1 Morphological Neural Processing 

Morphological processing is based on the lattice algebra: (R,a where a is the 
MIN operator [2]. The main property of this algebraic structure is distributivity of 
summation with respect to operator a, this is: 

a + {b Ac)={a + b)A(a + c) (1) 

From the point of view of neural processing, Ritter [4] proposes a model of neuron 
where the synaptic weights interact additively with the input signals; the dendrites 
discriminate by taking into account the minimal value of the incident signals, (see 
Figure 1). 



Dendrite 



Output neuron 



Axonal branch 



Fig. 1. Model of a morphological neuron. 

In this model each branch can be of excitation or inhibition (excitation branches 
end with a black circle). The output neuron might have several dendrites; the output 
of each one of them can be negated or not. 

A fundamental difference with respect to the neuronal classical model is that in 
morphological processing discrimination among input signals is done by taking into 
account a threshold that depends on a min value. In the classical model a weighted 
average of the inputs is taken. Since the physiological point of view it appears more 
acceptable the threshold criteria, although the quality of the models, it what we want 
to emulate is a biological process, only could be judged through the insights and 
scientific experiments in the area. 
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3.2 Dendrite Computation on Morphological Neurons 

One of the contributions of the model described in Section 3.1 is the capacity to 
accomplish processing practically over the same communication channel. In this case 
the axonal branches can be of excitation or inhibition; only at the moment of contact 
with the dendrite, just the MIN of the values remains, this is 

A(-iru+4)> (2) 

leL 



is the value that filters dendrite k, where Xj is the input and , CO^j^ are the synaptic 

excitation and inhibition weights, respectively. 

It is worth mentioning that, on this concept of distributed computing over the 
communication channel, can be the key that explains the efficiency of the nervous 
system, due to this model underlines the possibility that the fundamental processing of 
information is not only executed at the cellular bodies. 

In summary, the morphological neural computing model with dendrite processing 
has the following characteristics: We have several input neurons, one of output, the 
output neuron can have several dendrites, each of the input neurons can excite or 
inhibit the corresponding axonal branches, thus the result, y, of the output neuron is 
computed as (see Figure 2): 



y = fiADA^)) 

k=l 

where (x) is the output of the k-th dendrite when pattern x is input. Each (x) 
is obtained as follows 

Z),(x) = F,A A(-l)<^-'>(x,+<yi), x = (x,,...,x„)e Z?" (4) 

ze/ leL 

Factor G l]-. Classification function /(x)= 1 if and only if its argument 

is greater o equal to one (on-region) and zero otherwise (off-region). Therefore, the 
above-mentioned structure provides a solution to a binary classification problem. 

It is worth mentioning that the inhibition signal always carries a negative sign 
independently of the sign of its corresponding synaptic weight. 










Fig. 2. Morphological neural computing model. 
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3.3 Geometric Approach of Dendrite Morphological Computation 

In [4], it is enunciated and demonstrated a Theorem that is the base of the neural 
morphological model with dendrite computation. In a few words this result tells us 

that if X ^ R" is a compact set of patterns, then there exists a morphological neural 
net that classifies X as its on-region and its complements as its off-region. 

The above-mentioned Theorem comes together with an iterative algorithm that 
allows determining; given a set X, the parameters of the morphological neural net. For 
the details refer to [4] . 

However, the geometrical counterpart of all the analytic and algorithmic 
statements does not appears completely clear. Since a didactical point of view and of 
the correct assimilation of the concepts it is desirable to develop until possible an 
intuitive and geometrical idea of the formal aspects. It is worth mentioning that the 
proposed intuitive geometrical vision is not only important for a better understanding 
of the concepts, it will also allow to efficiently to develop a construction algorithm as 
occurs with the analytical tools as we will next see. 

a. To begin characterizing the geometrical approach, let us consider the effect 
of the axonal branches over the dendrites: 



In this case the firing region consists on the base of the triangle that is 
formed when the lines intersect, the complement constitutes the off-region 
(in the firing regions operator A takes positive or zero values; in the off- 
region it takes negative values). 

b. At each dendrite fall axonal branches from several input neurons; each of 
them define an on or off-region; all of them interact according to the 
following expression 




(5) 



This expression is the intersection between two lines that cut axe X, at - 
and - , respectively (see Figure 3). 




Fig. 3. Incidence of axonal branches over a dendrite. 



DT,{x^) = P,AA(-ir‘^\xf+w;,) 

IGl IgL 



(6) 



where x“^ = (xf ,...,xf^G R” is an input pattern, = 1,2, . . . , m . 



592 



R. Barron, H. Sossa, and H. Cortes 



Due to the intersection is given by operator A and each pair of axonal branches 
corresponds to different axes X^(i = l,...,n), this makes that the firing conjoint 

regions are formed with the Cartesian products of the corresponding firing regions at 
each input (see Figure 4). 

When Pjj, = — 1 , the firing region becomes an off-region and vice versa. It is 
important to note that the firing border frontier remains on, even when = — 1 . 




Fig. 4. Conjoint firing region. 



Finally, the computation of the output is done as: 



y(x) = / AD,(x) 



(6) 






This implies that the firing region of the net is the intersection of the regions of 
each dendrite (Figure 5). 



Firing region of the net 



0 






^ Dendrite firing region 

► 



Fig. 5. Firing region of the morphological net. 



In short, the firing region of all the net, what is what we want to characterize, is 
obtained by forming the firing regions (one for each input variable), by applying 
Cartesians products between firing regions or their complements, to finally get in 
general a hyper-rectangle as the firing region and its complement as the off-region. 
All the involved parameters in the above process can be synthetisized in a table as the 
one shown in Figure 6. 

At each row appear the synaptic weights of the axonal branches that fall on the 
corresponding dendrite, on last column specifies if the firing region is 
complemented or not. 
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Fig. 6. Parameters of the morphological net. 



4 Experimental Results 

The algorithm in geometrical terms is composed of two steps: 

1. Find the hyper-rectangle covering all patterns belonging to class Ci (firing 
patterns), although patterns of class Q (turn-off patterns) are included. 

2. Isolate the points belonging to class Co in maximal neighborhoods and take 
the complement of these neighborhoods so that the neighborhoods become 
part of the off-region. 



Example 4.1. Let X= [(1,4),(2,5),(2,2),(3,2),(3,3),(4,4),(5,1)] patterns of class Cy (•) 
and X =[(2,3), (3, 4), (4, 3)] of class Co (o): 
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To verify that the net has been effectively derived, let us classify patterns: (1,4) 
and (2,3) belonging to classes Ci, and Co, respectively. 

With (1,4), we have: 




Ti 

4 

v*- 

= a{a [(^1 + col ) 1 a [(:^2 + ®21 + ®21 )]} 



= aa(-i)‘ 

/=1 /=0 



= a{a 1(1 + -1);-(1 - 5)1a[( 4 - li-(4 -5l 
= a{a [0,41a[3,1]}= a{0,1}= 0 
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With (2,3), we have: 
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5 Conclusions and Future Work 

It has been shown that the morphological neural model with dendrite computation is 
more intuitive from a geometrical point of view, this has been manifested by the given 
example, where the final table can be also obtained by following the algorithm 
proposed by Ritter in [4] . 

As future work we are working on a visual computational tool that automatically 
allows determining the final parameters of a net as we did in the example; and this 
with the aim to account with didactic tool to facilitate the training of the 
morphological neural model with dendrite computation. 
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Abstract. In this paper we propose an effective method to summarize document 
clusters. This method is based on the Testor Theory, and it is applied to a group 
of newspaper articles in order to summarize the events that they describe. This 
method is also applicable to either a very large document collection or a very 
large document, in order to identify the main themes (topics) of the collection 
(documents) and to summarize them. The results obtained in the experiments 
demonstrate the usefulness of the proposed method. 



1 Introduction 

Topic Detection and Tracking (TDT) is a new line of research that comprises three 
major sub-problems: segmenting speech-recognized TV/radio broadcasts into news 
stories, detecting novel events, and tracking the development of an event according to 
a given set of sample stories of that event [1]. An event in the TDT context is 
something that occurs at a specific place and time associated with some specific 
actions [2]. For example, the eruption of Mount Pinatubo on June IS"*, 1991 is 
consider an event. 

Starting from a continuous stream of newspaper articles, the Event Detection 
problem consists in determining for each incoming document, whether it reports on a 
new event, or it belongs to some previously identified event. 

Clustering algorithms have been traditionally used in the Event Detection problem, 
such as the K-Means, Single-Pass and others [3, 4, 5]. In our approach, we use the 
incremental compact algorithm [6, 7] to solve this problem. This algorithm obtains 
high quality temporal- semantic clusters of documents, which represent the events of 
the collection, and it is independent of the document arrival order. 

Another important problem that arises in the event detection systems is that of 
providing summaries for the detected events. Apart from the set of the cluster’s 
frequent terms [8] and the relevant news titles [9], these systems do not offer any 
further information about the events that the generated clusters are representing. 
Consequently, many times users have to read the documents of the clusters to know 
the events they report. In the literature, the problem of summarizing a set of input 
documents (called multidocument summarization) has received much attention lately 
(e.g. [10, 11]). 
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Basically, a multidocument summarization system tries to determine which 
sentences must be included in the summary, and then how to organise them to make 
the summary comprehensible. Many of these approaches are based on a sentence 
weight function that takes into account the position of the sentences in the documents, 
the length of the sentences, and the number of frequent keywords of the set of 
documents they include [12]. In this way, all the sentences in the document cluster 
must be scored to select the most appropriate for the summary. One of the main 
drawbacks of the current scoring procedures is that they are slow because the weight 
of a sentence depends on whether other sentences have been selected or not [13]. 

In this paper we present a novel and effective method for the multidocument 
summarization problem, based on the Testor Theory [14]. Starting from a set of 
document clusters, each one representing a different event or topic, our method tries 
to select the frequent terms of each cluster that are not included in the other clusters 
(testers). These terms are usually tightly related to the event of the cluster. Once these 
terms have been selected, the system extracts all the sentences that contain the 
selected terms. Finally, the system orders the extracted sentences and it produces the 
cluster’s summary from them. 

The proposed method computes very fast, and it produces good summaries for the 
document clusters we have analysed. Unlike the other methods in the literature, the 
selection of sentences is based on the discriminating frequent terms of each cluster, 
which can be efficiently computed. 



2 Document Representation 

The incoming stream of documents that feed our system comes from some on-line 
newspapers available in Internet, which are automatically translated into XML. This 
representation preserves the original logical structure of the newspapers. 

From them, our detection system builds three feature vectors to represent each 

document cV , namely [7]; 

• A vector of weighted terms ( TF^ , ... , TF'^ ), where the terms represent the lemmas 
of the words appearing in the content of the document, and TF^. is the relative 

frequency of the term in d' . Stop words are disregarded from this vector. 

• A vector of weighted time entities, where time entities are either dates or date 
intervals. These time entities are automatically extracted from the content of the 
documents by using the algorithm presented in [15]. 

• A vector of weighted places. These places are automatically extracted from the 
content of the documents by using a thesaurus of place names. 

The automatic construction of cluster summaries must take into account these three 
components. In [15], it was shown how the temporal entities of a document (cluster) 
can be summarized as a date interval called event-time period. Places can be easily 
summarized by taken a representative place from the cluster documents. Thus, in this 
paper we only focus on the term vector to extract the cluster summaries. 



598 A. Pons-Porrata, J. Ruiz-Shulcloper, and R. Berlanga-Llavori 



3 Basic Concepts 

Before presenting our summarization method, we review the main definitions of the 
Testor Theory [14] and we define the basic concepts of this method. 

In our problem, the collection of news is partitioned into clusters. Each cluster 
represents an event. Let ^ be a set of detected events in a news collection. 

The representative of a cluster c e denoted as c , is calculated as the union of 
the documents of that cluster, that is, c = {tF^ ), where TFJ is the relative 

frequency of term r in the sum vector of the documents of that cluster. 

Given a cluster c, let T{c) = } be the most frequent terms in the 

c 

representative c , i.e., the terms t. such that TF': >e,j= 1,..., and 8 is an user- 
defined parameter. 

For each cluster c, we construct a matrix MR(c), whose columns are the terms of 
T(c) and its rows are the representatives of all clusters of described in terms of 
these columns. Notes that this matrix is different in each cluster. 

In the Testor Theory, the set T = {X; ,...,x^ } of features and their corresponding 

1 k 

columns {i,,...,ij of a matrix M is called a testor, if after deleting from M all columns 
except {ij,...,ij}, all rows of M corresponding to distinct clusters are different. A testor 
is called irreducible {typical) if none of its proper subsets is a testor [14]. The length 
of the testor is the cardinal of T. 

For the calculus of the typical testors of a matrix M, the key concept is the 
comparison criterion of the values of each feature. One of these criteria is, for 
example: 



where V;^ , are the values in the rows i and j in the column corresponding to the 
feature respectively, and 5 is an user-defined parameter. 

In order to determine all typical testors of a matrix we use the algorithm LEX, which 
is described in detail in [16]. This algorithm outperforms the other algorithms to 
calculate the typical testors. 

Let a sentence 5 of a document d and {/ be a set of terms. We call maximal co- 
occurrence of S with respect to U, and we will denoted it as mc{S,U), to the set of all 
terms of U that also occur in S. 



4 Method of Summarization 

In our opinion, a summary of an event should include the terms that characterize the 
event, but also those that distinguish this event from the rest. 

A summary of an event c consists of a set of sentences extracted from the 
documents in c, in which the highest quantity of terms that belong to typical testors of 




( 1 ) 
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the maximum length of the matrix MR{c) occurs. Moreover, the sentences that cover 
the calculated typical testor set are also added to the summary. 

In order to improve the coherence and organization of the summaries, we sort the 
extracted sentences according to the publication date of the news and their position in 
the text. 

In order to calculate these typical testers, we considered two classes in the matrix 
MR{c). The first class is only formed hy c and the second one is formed hy all 
remaining cluster representatives. The comparison criterion applied to all the features 
is that of (1). Notice that this criterion requires that the terms frequently appear in the 
cluster documents but not in the other clusters. 

The proposed summarization method is described as follows: 



Algorithm Summarization of event set 
Input: a set of events (clusters) of a news collection, 

e: threshold of term frequencies. 

8: parameter of the comparison criterion 
Output: Summary of each event. 

For each event c e 

1. Construct the matrix MR(c). 

2. Calculate the typical testers of the maximum length in the matrix MR{c). 

3. Let U be the union of all typical testers found in the step 2. 

4. For each document di in the cluster c: 

For each sentence S in d/: 

Calculate the maximal co-occurrence mc(S,U). 

5. Order decreasingly all sentences in terms of the cardinal of its maximal co-occurrence. 

Let pi > p 2 > ... > ps be these cardinals. 

6. Summary (c) = 0 

7. Add to Summary(c) all sentences S that satisfy |mc(5',C/)| = . 

8. W = U mc(S,U). 

\mc(S,U)\=Pi 

9. i = 2. 

10. mWe U\W ^0 do: 

V=0. 

For each mc(S,U) of cardinal pr 
If mc{S,U) n{U\W) ^0 then 
Add S to Summary(c). 

V=Vumc(S,U) 

/ = / + 1 
W= M/u y 

1 1 .Sort all sentences in Summary(c) according to the publication date of the news in c and their 
position in the text. 



The paragraph is a useful semantic unit for the construction of the summaries because 
most writers view a paragraph as a topical unit, and organize their thoughts 
accordingly. Therefore, if we want to obtain more extensive summaries, we can use 
the paragraphs instead of the sentences. Thus, we would extract the paragraphs that 
cover the typical testor set. 
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5 Experiments and Results 



The effectiveness of the proposed summarization method has been evaluated using 
two collections. The first one contains 554 articles published in the Spanish 
newspaper "El Pais" during June 1999. We have identified 85 non-unitary events, 
being their maximum size of 18 documents. The collection covers 21 events 
associated to the end of the “Kosovo War” along with their immediate consequences, 
the visit of the Pope to Poland, the elections in several countries like South Africa, 
Indonesia and the European elections, the events related with the trials to Pinochet 
and the Kurdish leader Ocalan, among others. 

In order to show the quality of the obtained summaries one detected event is selected. 
It is about the murder of the famous Mexican presenter Paco Stanley. This cluster is 
formed by 5 documents. Table 1 shows the titles, the publication dates and the length 
(number of words) of each document in this cluster. 



Table 1. Documents about the murder of Paco Stanley. 



Publication date 


Title 


Length 


1999-6-8 


Commotion in Mexico for the murder of a famous presenter. 


531 


1999-6-9 


The Mexican Intelligence declares that the murdered comedian had 
links with the drug traffic. 


748 


1999-6-9 


The televisions incite the indignation against Cardenas. 


298 


1999-6-10 


An atmosphere of collective hysteria has been created. 


203 


1999-6-10 


The death of Stanley agitates the political atmosphere in Mexico. 


540 



The union of the found typical testers for this cluster is: {murder, Mexican, death). 
The obtained summary of this event by our method is the following one: 



Paco Stanley presented several popular variety programs in the Aztec Television, and Its 
death has caused a deep commotion In the Mexican society, in which he was a very 
appreciated person. Porfirio Munoz Ledo, the mayor's competitor in the PRD internal fight 
for the presidential nomination of the party in the presidential elections of the 2000, 
annotated new causes in the murder of the showman, as he declared that 'it would be 
immoral' that this notorious death couid affect the election process, in which Cardenas 
aspires to the Republic presidency. During hours, the reporter Raiil Trejo explained, the only 
thing that was seen in the Mexican television screens, after murder of Paco Stanley, was a 
parade of laments, complaints and demands that nothing clarified, but that they became a 
battering ram. 



As we can see, this summary offers a concise vision of the main aspects of the murder 
of Paco Stanley. This summary has 139 words, in contrast with the 2320 words in 
total that all documents of the cluster have. That is, a 94% of compression rate. It is 
worth mentioning that we try to maintain the original fragments in the carried out 
translation of this summary. 

Figure 1 shows the results for the compression rate (in %) at each event with 
respect to its size (total number of words in the documents of the event). As we can 
see, there exists a tendency such that the higher event size, the greater the 
compression rate is. 
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Fig. 1. Compression rate against the total number of words in each event. 



In order to evaluate the effectiveness of our method, we also used the data (in 
Spanish) from the TREC-4 conference [17]. This collection contains 693 articles 
published by AFP agency during 1994. These articles are classified into 23 topics. 

Table 2 shows for each topic, its size (number of documents), the size of its typical 
testor set (number of terms) and the obtained compression rate (in %). 



Table 2. Obtained results in TREC data. 



Topic 


Size 


TT 


Rate 


Topic 


Size 


TT 


Rate 


Topic 


Size 


TT 


Rate 


SP51 


83 


4 


96.81 


SP60 


47 


7 


97.94 


SP69 


62 


4 


98.06 


SP52 


13 


3 


98.20 


SP62 


15 


6 


85.72 


SP70 


6 


3 


74.76 


SP53 


46 


2 


99.36 


SP63 


5 


4 


83.43 


SP71 


17 


3 


99.29 


SP54 


37 


8 


94.38 


SP64 


9 


4 


83.54 


SP72 


14 


7 


91.05 


SP55 


108 


4 


99.29 


SP65 


29 


5 


96.40 


SP73 


13 


5 


93.48 


SP57 


2 


3 


82.05 


SP66 


68 


4 


97.32 


SP74 


34 


10 


97.65 


SP58 


49 


9 


97.91 


SP67 


13 


9 


87.20 


SP75 


20 


3 


94.09 


SP59 


7 


6 


93.74 


SP68 


12 


8 


93.78 





Again, the obtained summaries capture the main ideas about each topic and an 
appreciable reduction of words is also achieved. For example, the description given 
by TRFC to the topic SP68 is “AIDS situation in Argentine and what steps is the 
Argentine government taking to combat the disease”. The union of the found typical 
testors for this topic is: {Argentine, campaign, prevention, disease, population, drug, 
case, people). The summary obtained for this topic is the following one: 



The combination of the DDi drugs and hidroxamates against the AiDS disease "couid be the 
only way to remove the virus from the seropositive peopie”, the argentinean doctor Juiio 
Viia, which ieads a research group in France, said this Sunday. The Heaith authorities 
announced this Friday that the Argentine Government will start a educational campaign to 
prevent the Acquired Immune Deficiency Syndrome (AIDS). The minister of Flealth and 
Social Action, Alberto Mazza, had announced this Friday that the government will start a 
educational campaign, whose objective is “to instruct the population about the prevention 
measures that must be adopted to combat the AIDS’’. The campaign of preventive education 
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will be started after publishing the results of a national public-opinion poll about the 
population knowledge on the disease, according to Mazza declarations. Although detected 
during the last years, the cases were just known now, pointing out that the situation has 
produced a prevention campaign ordered by the army chief General Martin Balza, which 
presumes that we are against a "serious but controllable problem". In the same way, the 
Army Immune Deficiency Center (CEIDE) was created within the Central Army Hospital as 
an institution that will be in charge of the prevention campaigns, as well as the treatment 
and monitoring of the AIDS cases. 



Indeed, it is hard to evaluate the quality of a summarization method. In spite of this, 
we consider our summaries readable, coherent and excellent at capturing the main 
themes of the document sets. Thus, we believe that these summaries can be presented 
to the user as a meaningful description of the cluster contents. 



6 Conclusions 

In this work we presented an effective method to summarize document clusters 
generated by Topic Detection Systems. The proposed method employs the calculus of 
typical testors as its primary operation and from them, it constructs the summaries of 
each cluster. 

The most important novelty is the use of typical testors combined with different 
techniques and heuristics, to produce all together better summaries. 

This method enables construction of a concise representation of the focused 
cluster. The obtained summaries are much more descriptive than simple sets of 
frequent words. 

The proposed method is applied to a set of newspaper articles in order to 
summarize the events that they describe. It is helpful to a user in order to determine at 
a glance whether the content of an event are of interest. The carried out experiments 
demonstrate the usefulness of the method. The summaries are readable, coherent and 
well organized. In most cases, the system successfully presents main themes, skips 
over minor details, and avoids redundancy. Additionally, the proposed summarization 
algorithm performs efficiently, taking much less time than the clustering process. 

To sum up, the summarization method is robust, topic-independent and may easily 
be applied in other domains and other languages. Additionally, it can be applied to 
other document collections such as Web pages, books, and so on. For example, in a 
book we can consider as clusters some structural elements of the document (chapters, 
sections, etc.), being its members the different sub-structures they contain 
(subsections, paragraphs, etc.) 
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Abstract. The problem of Prepositional Phrase (PP) attachment disambiguation 
consists in determining if a PP is part of a noun phrase, as in He sees the room 
with books, or an argument of a verb, as in He fills the room with books. Volk 
has proposed two variants of a method that queries an Internet search engine to 
find the most probable attachment variant. In this paper we apply the latest 
variant of Volk’s method to Spanish with several differences that allow us to 
attain a better performance close to that of statistical methods using treebanks. 



1 Introduction 

In many languages, prepositional phrases (PP) such as in the garden can he attached 
to noun phrases (NP): the grasshopper in the garden, or verb phrases (VP): plays in 
the garden. Sometimes there are several possible variants for attachment of a given 
PP. For example, in The police accused the man of robbery we can consider two pos- 
sibilities: 



(1) The police [accused [the man of robbery]] 

(2) The police [accused [the man] of robbery] 

In the case (1) the object of the verb is the man of robbery, and in (2) the object is the 
man, and the accusation is of robbery. An English speaker knows that the second 
option is the correct one, whereas for a computer we need a method to automatically 
determine which option is correct. 

There are several methods to find the correct PP attachment place that are based on 
treebank statistics. These methods have been reported to achieve up to 84.5% accu- 
racy [1], [2], [3], [4], [5], [6]. However, resources such as treebanks are not available 
for many languages and they are difficult to port, so that a less resource-demanding 
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method is desirable. Ratnaparkhi [7] describes a method that requires only a part-of- 
speech tagger and morphological information. His method uses raw text to be trained. 

The quality of the training corpus significatively determines the correctness of the 
results. Specially, to reduce the effects of noise in a corpus and to consider most of 
the phenomena, a very large corpus is desirable. Eric Brill [8] shows that it is possible 
to achieve state-of-the-art accuracy with relatively simple methods whose power 
comes from the plethora of texts available to such systems. His paper also gives ex- 
amples of several NLP applications that benefit from the use of very large corpora. 

Nowadays, large corpora comprise more than 100 million words, whereas the Web 
can be seen as the largest corpus with more than one billion documents. Particularly 
for Spanish, Bolshakov and Galicia-Haro [9] report approximately 12,400,000 pages 
that can be found through Google. We can consider the Web as a corpus that is big 
and diverse enough to obtain better results with statistical methods for NLP. 

Using the Web as corpus is a recently growing trend; an overview of the existing 
research that tries to harness the potential of the web for NLP can be found in [10]. In 
particular, for the problem of finding the correct PP attachment, Volk [11], [12] pro- 
poses variants of a method that queries an Internet search engine to find the most 
probable PP attachment. 

In this paper we show the results of applying the latest variant of Volk’s method 
with several differences to Spanish. In Section 2 we explain the variants of Volk’s 
method. In Section 3 we present the differences of the method we use with regard to 
his method. In Section 4 we explain the details of our experiment and the results we 
obtained, and finally we draw the conclusions. 



2 Volk’s Method 

Volk proposes two variants of a method to decide the attachment of a PP to a NP or a 
verb. In this Section we explain both variants and their results. 



2.1 First Variant 

Volk [11] proposes disambiguating PP attachments using the web as corpus by con- 
sidering the co-occurrence frequencies (freq) of verb + preposition against those of 
noun + preposition. The formula used to calculate the co-occurrence is: 

cooc(X,P) = freq(X,P) / freq (X) 

where X can be either a noun or a verb. Lor example, for He fills the room with books, 
N = room, P = with, and V =fill. The value of cooc(X,P) is between 0 (no co- 
occurrences found) and 1 (the words always occur together) 

The value of freq (X,P) is calculated by querying the AltaVista search engine 
using the NEAR operator: freq (X, P) = query ("X NEAR P"). 

To choose an attachment variant, cooc (N+P) and cooc (V+P) are calculated, and the 
variant with the higher value is chosen. If some of the cooc values are lower than a 
minimum co-ocurrence threshold, the attachment cannot be desambiguated, and thus 
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Table 1. Coverage and Accuracy for Volk’s 2000 algorithm 



threshold 

0.1 

0.3 

0.5 



coverage 

99% 

36.7% 

7.7% 



accuracy 

68% 

75% 

82% 



it is not covered. By adjusting the minimum co-occurrence threshold, Volk’s 2000 
algorithm can attain very good coverage but poor accuracy, or good accuracy with 
low coverage. Table 1 shows the coverage / accuracy values for Volk’s experiments. 

Volk [11] also concludes that using full forms is better than using lemmas. 

The same experiment has been done for Dutch by Vandeghinste [13], reaching for 
a coverage of 100% an accuracy of 58.4%. To obtain an accuracy of 75%, Vandegh- 
inste used a threshold of 0.606, yielding the coverage of only 21.6%. 



2.2 Second Variant 

In a subsequent paper [12], Volk uses a different formula to calculate co-occurrences. 
Now the head noun of the PP is included within the queries. The formula used is: 

cooc(X,P, N 2 ) = freq(X, P, Nj) / freq(X) 

where freq (X,P,N 2 ) is calculated by querying the AltaVista search engine using the 
NEAR operator: freq(X,P,N 2 ) = query ("X NEAR P NEAR N 2 " ) . X can be Nj or V. 
For example, for He fills the room with books, N,= room, P = with, N^= books and 
V =fdl. 

Volk experiments first by requiring that both cooc (N^, P,N 2 ) and cooc (V, P,N 2 ) can 
be calculated to determine a result. Then, he considers using a threshold to determine 
the PP attachment when one of cooc (Ni, P,N 2 ) or cooc (V, P,N 2 ) is not known. That is, 
if cooc (Ni, P, N 2 ) is not known, cooc(V,P,N 2 ) must be higher than the threshold to 
decide that the PP is attached to the verb, and vice versa. Afterwards, by including 
both lemmas and full forms in queries, Volk attains a better performance, and by 
defaulting to noun attachment for previously uncovered attachments, he attains the 
coverage of 100%. The results he found are shown as Table 2. 

For Dutch, requiring both cooc (Nj, P,N 2 ) and cooc (V, P, Nj) , Vandeghinste 



Table 2. Results of Volk's 2001 Method 



coverage 


accuracy 


requires both 
cooc (Ni, P,N 2 ) 
and 

C00C(V,P,N2)? 


threshold when 
cooc (Ni, P,N 2 ) 
or 

cooc (V,P,N 2 ) 
is not known 


includes 
both lemmas 
and full 
forms in 
queries? 


defaults to 
noun 

attachment for 
uncovered 
attachments? 


55% 


74.32% 


yes 








63% 


75.04% 




0.001 






71% 


75.59% 




0.001 


yes 




85% 


74.23% 




0 


yes 




100% 


73.08% 




0 


yes 


yes 
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achieves a coverage of 50.2% with an accuracy of 68.92. Using a threshold and in- 
cluding both lemmas and full forms in queries, he reaches 27% coverage for an accu- 
racy of 75%. For 100% coverage, defaulting the previously uncovered cases to noun 
attachments, an accuracy of 73.08% is obtained. 



3 Improving Performance 

Methods to resolve PP attachment ambiguity based on treebank statistics achieve by 
far a better performance than the experiments described above. Nonethless, we think 
that there are several elements that could be changed to improve methods based on 
Web queries. One of the elements to consider is the size of the document database of 
search engines. Indeed, this is relevant for finding representative co-occurrence fre- 
quencies for certain language. It is known that not every search engine yields the 
same results. For example. Table 3 shows the number of co-occurrences found from 
different search engines for the same words: 

Table 3. Number of co-ocurrences found in several search engines 





leer en el metro 


read in the subway 


Google 


104 


30 


All-the-Web 


56 


23 


Altavista 


34 


16 


Teoma 


15 


19 



Google is ranked as search engine with the largest database size by the search en- 
gine showdown.* Because of its greater document database size, we have determined 
that using Google to obtain word co-occurrence frequencies can yield to better results. 

Another element to consider is the use of the NEAR operator. We decided do not 
using it the since it does not guarantee that the query words appear in the same sen- 
tence. Let us consider the following queries from AltaVista: 

(1) wash NEAR with NEAR door 6,395 results 

(2) wash NEAR with NEAR bleach 6,252 results 

(1) yields 6,395 pages found, even when books are unrelated to the wash operation. 
Compared to (2) that yields 6,252 pages found, we can see that there is no clear dis- 
tinction of when is a preposition + noun related to a verb. On the other hand, using an 
exact phrase search yields 0, which marks out a clear distinction between wash with 
door and wash with bleach. The numbers of the pages found are as follows: 



Exact phrase search 


AltaVista 


Google 


"'wash with door” 


0 


0 


“wash with bleach” 


100 


202 



Following [12], we use jointly full forms and lemmatized forms of nouns and verbs 
to obtain better performance. However, as we are not using the NEAR operator, we 



* Information taken from www.searchengineshowdown.com, update of December 31st, 2002. 
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Table 4. Queries to determine the PP attachment of Spanish Yeo al goto con un telescopio and 
English I see the cat with a telescope 



Veo al gato con un telescopio 


hits 


I see the cat with a telescope 


hits 


ver 


296,000 


see 


194,000,000 


"ver con telescopio" 


8 


"see with telescope" 


13 


"ver con telescopios" 


32 


“see with telescopes” 


76 


"ver con un telescopio" 


49 


"see with a telescope" 


403 


"ver con el telescopio" 


23 


"see with the telescope" 


148 


"ver con unos telescopios" 


0 


“see with some telescopes” 


0 


"ver con los telescopios" 
veo 

"veo con telescopio" 

"veo con telescopios" 

"veo con un telescopio" 
"veo con unos telescopios" 
"veo con el telescopio" 
"veo con los telescopios" 


7 

642,000 

0 

0 

0 

0 

1 

0 


“see with the telescopes” 
(no such forms in English) 


14 


freq( veo, con, telescopio) = 


1.279x10“ 


freq(see,with,telescope) = 


3.371x10“ 


gato 


185,000 


cat 


24,100,000 


"gato con telescopio" 


0 


"cat with telescope" 


0 


"gato con telescopios" 


0 


“cat with telescopes” 


0 


"gato con un telescopio" 


3 


“cat with a telescope" 


9 


"gato con unos telescopios” 


0 


“cat with some telescopes” 


0 


"gato con el telescopio" 


6 


"cat with the telescope" 


2 


"gato con los telescopios" 


0 


“cat with the telescopes” 


0 


freq(gato,con,telescopio) = 


0.486x10“ 


freq(cat,with,telescope) = 


0.456x10“ 



must consider the determiners that can be placed between the noun or verb and the 
preposition. Also we consider that the nucleus of the PP might appear in plural, with- 
out affecting its use. To illustrate this, consider the following sentence^: 

Veo al gato con un telescopio T see the cat with a telescope” 

The attachments are calculated by the queries shown in Table 4. Since 
freq (veo, con, telescopio) > freq (gato, con, telescopio) , the attachment is dis- 
ambiguated as veo con telescopio ‘see with telescope’. 



4 Experiment and Results 

For our evaluation we extracted randomly 100 sentences from the LEXESP corpus of 
Spanish [15] and the newspaper Milenio Diario^. All searches were restricted to 
Spanish pages. 

Eirst, we considered not restricting queries to a specific language, given that a 
benefit could be obtained from similar words across languages, such as Erench and 



^ Example borrowed from [14]. 
3 www.milenio.com 
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Spanish. For example, the phrase responsables de la debacle ‘responsibles of the 
rout’ is used in both languages varying only in its accentuation {debacle in French, 
debacle in Spanish). As Google does not take into account word accentuation, results 
for both languages are returned by the same query. However, with an unrestricted 
search, Google returns different count-ups in its APF and in its GUI.^ For example, 
for ver ‘to see’, its GUI shows 270,000 results, whereas its API returns more than 
20,000,000, even enabling the “group similar results” filter. This enormous deviation 
can be reduced by restricting language to a specific language. For Spanish, a restricted 
search for ver ‘to see’ in the GUI returns 258,000 results, whereas in the API it re- 
turns 296,000. Currently we are not aware of the reason for this difference; in any 
case it does not have any serious impact on our experiments. 

The sentences of our experiment bear 181 cases of preposition attachment ambigu- 
ity. From those, 162 could be automatically resolved. They were verified manually 
and to determine that 149 of them were resolved correctly and 13 were incorrect. 

In terms of coverage and accuracy used by Volk, we obtain the coverage of 89.5% 
with an accuracy of 91.97%. Without considering coverage, the overall percentage of 
attachment ambiguities resolved correctly is 82.3%. 



5 Conclusions 

We have found an increase in performance using Volk’s method with the following 
differences: 

— using exact phrase searches instead of NEAR operator; 

— using a search engine with a larger document database; 

— searching combinations of words that include definite and indefinite articles; and 

— searching for singular and plural forms of words when possible. 

The results obtained with this method (89.5% coverage, 91.97% accuracy, 82.3% 
overall) are very close to those obtained by using treebank statistics, without the need 
of such expensive resources. 

A demo version of a program implementing our method can be found at the web- 
site likufanele.com/ppattach. 
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Abstract. A new automatic method based on an intra-cluster criterion, to obtain 
a similarity threshold that generates a well-defined clustering (or near to it) for 
large data sets, is proposed. This method uses the connected component crite- 
rion, and it neither calculates nor stores the similarity matrix of the objects in 
main memory. The proposed method is focussed on unsupervised Logical Com- 
binatorial Pattern Recognition approach. In addition, some experimentations of 
the new method with large data sets are presented. 



1 Introduction 

In unsupervised Pattern Recognition area many algorithms have been proposed [1]. 
Some of them are based on graph theory. In this paper we will consider the approach 
based on graph proposed in the Logical Combinatorial Pattern Recognition [2, 3]. 

In this context, it is assumed that the structure of one universe is not known. To find 
such structure an initial sample is given, then the problem is precisely to find the 
classes, the groupings. 

The main idea consists in consider the data as vertexes in a graph and the similarity 
among the objects as edges. In this way the problem of unsupervised classification can 
be seen as finding subgraphs (clusters) in the initial graph (initial sample). 

Note that there exists a natural correspondence among data, their similarity and a 
graph whose vertexes are objects and the weight of their edges is the similarity be- 
tween adjacent vertexes. 

In this context a parameter pg can be introduced for controlling how many similar a 
pair of objects must be in order to be considered similar. As result then a new graph 
containing only edges with weight greater or equal than pg (the parameter) is obtained. 

Therefore depending on the desired closeness in similarity, an appropriate value for 
this parameter can be chosen by the user and then different graphs are obtained. 



A. Sanfeliu and J. Ruiz-Shulcloper (Eds.): CIARP 2003, LNCS 2905, pp. 611-618, 2003. 
© Springer-Verlag Berlin Heidelberg 2003 



612 



G. Sanchez-Diaz and J.F. Martmez-Trinidad 



Now the problem is reduced to find in the resultant graph certain subgraphs. For 
example, we can find connected components in the graph fixing a certain /?„, where 
each vertex is an object of the sample and all the edges have a weight greater than pg. 
Note that when the value of pg is modified the graph may change and then the con- 
nected components also can change obtaining a different clustering for each value of 
pg. Here rises a natural question, what value of pg must be chosen?. 

There are many criteria to find subgraphs (clustering criteria), in [4] are presented 
some of them as yS„-connected components, /?„-compact sets, /?„-strictly compact sets, 
and y^„-complete maximal sets. 

The problem of choosing an adequate value for /?„, without neither to calculate nor 
to store a similarity matrix for the objects, is studied in this paper. A new automatic 
method to obtain a similarity threshold to generate a well-defined clustering (or near 
to it) for large data set is proposed. This method is based on the maximum and mini- 
mum values of similarity among objects and it calculates an intra-cluster similarity for 
each cluster. Then the method uses this intra-cluster similarity to obtain a global value, 
which indicates what clustering has the best average intra-cluster similarity. The new 
method uses the GLC algorithm [5], which generates a clustering based in connected 
components criterion for large data sets. 



2 Related Works 

In [6] a method to determine the parameter Pg for a hierarchical algorithm is proposed. 
This method uses as similarity between two clusters C. and C the expres- 
sion = max-{yfl(0,0')}. Then using a traditional hierarchical algorithm, i.e, 

O'^Cj 

grouping the two clusters more similar in each level, a dendograme is built. Using the 
dendograme the user can choose the parameter pg that prefers according to the number 
of clusters generated by this Pg value. 

In this case, the user determines the value of pg in function of the number of cluster 
that he want to get analyzing the dendograme, the method automatically not determine 
the value /?„. 

Another related work is [7], where some concepts and theorems to demonstrate the 
number of forms that a sample of objects can be partitioned in y5„-connected compo- 
nents, are introduced. The values pg that characterize the different forms of partition- 
ing the sample and the cardinality for each component are also studied. 

In this paper, an algorithm such that given a number 0<k<m, (where m is the num- 
ber of objects in the sample), the k y^„-connected components are generated, is pro- 
posed. 

Note that this method is very useful if the number of y^„-connected components to 
form is known, otherwise when the number of clusters to form is an incognita in the 
problem we can not use the method. 

Although these techniques were not developed in order to process large data sets. 
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The problem of determining the value of that generates natural clusters (well- 
defined clusters) is very important in the context of Logical Combinatory Pattern 
Recognition approach. Therefore in the next sections a new method to estimate for 
large data sets is introduced. 



3 Basic Concepts 

In this section, the context of an unsupervised classification problem in the Logical 
Combinatorial Pattern Recognition is explained and also some basic concepts that our 
method uses to determine art introduced. 

Let Q.={0j,02,...,0^}he a set of objects and R={Xj,X 2 ,...,xJ a set of features. A de- 
scription 1(0) is defined for every O eQ. and this is represented by an n-tuple, e.g. 
I(0)=(Xj(0),...,xJ0j)eDjX...xD^ (initial representation space), where x/OJeD,; 
i=l,...,n\ and D. is the domain of admissible values for the feature x.. Di can be a set of 
nominal, ordinal and/or numerical values. 

Hereafter we will use O instead of 1(0) to simplify the notation. 

Definition 1. A comparison criterion [4] is a function tp:.DxD^ >L. which is as- 

sociated to each feature x. (/=!,...,«), where: 

(pfXi(0)pcj(0))=rmr\{y} , y&L., if tp. is a dissimilarity comparison criterion between 
values of variable x., or 

(pj(XlO)pc.(0))=tmx{y} , yeL., if tp. is a similarity comparison criterion between 
values of variable for i=\,...,n. tp. is an evaluation of the similarity or dissimilarity 
degree between any two values of the variable x.. L. i=l,...,n is a total ordered set, 
usually it is considered as L=[0,1]. 

Definition 2. Let a function P: (DjX...xDJ^—> L, this function is named similarity 
function [8], where L is a total ordered set. 

The similarity function is defined using a comparison criterion for each attribute. 

Definition 3. In a clustering process (also in supervised classification) will under- 
stood by Data set (DS) such collection of object descriptions that the size of the set of 
descriptions together with the size of the result of the comparison of all object de- 
scriptions between objects (similarity matrix) does not exceed the available memory 
size. A Large Data Set (LDS) will be called in the case when only the size of the set of 
descriptions does not exceed the available memory size. And a Very Large Data Set 
(VLDS) will be called when both sizes exceed the available memory size [9]. 

Definition 4. Let be a similarity function and L a similarity threshold. Then 
two objects (9;, O.eQ. are P„-similars if P(0i,0^)>P„. If for all OeQ. P(0^,0j)<P„, then 
Dj is a Pg-isolated object. 

Definition 5 (Intra_i criterion). Let C,,...,Q. be k clusters obtained after apply a 
clustering criteria with a certain The intra-cluster similarity criterion (IntraJ) is 
defined as follows: 
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max^ 


if max^. 


= 


= max^ 


Intra_i(C.)= ■ 


maXj^, - min^. 


if maXp, 


min^. 






min^ 


if maXp, 


= 


^ max^ 



where max^ and min^ are the maximum and minimum similarity values that p, the 
similarity function, can takes (i.e. max^=1.0 and min^=0.0, if L=[0, 1] for example). 
Besides, max^, and min^,_ are the maximum and minimum similarity values among 
objects belonging to the cluster Cj. 

According to Han and Kamber [10] a good clustering method will produce high 
quality clusters (well-defined clusters), with high intra-cluster similarity and low inter- 
cluster similarity. 

The proposed Intra_i criterion was inspired in two facts. First, The Intra_i criterion 
gives a low weight to those clusters where the difference between the maximum and 
the minimum similarity between objects is low (or null). Also, this criterion gives a 
high weight to those clusters formed by only one object. This is because these clusters 
may be outliers or noise, and its global contribution can generates not adequate results. 

Second, in this approach of unsupervised classification based in Graph Theory there 
are two trivial solutions: when /?„=min^, obtaining one cluster with all objects of the 
sample, and when /?„=max^, generating m clusters, each one formed by only one object. 

Then, while is increased from 0.0 to 1.0 the IntraJ. takes several values in func- 
tion of the difference between the maximum and the minimum similarity between 
objects in the different clusters for each clustering. Therefore, we propose that a rea- 
sonable way to determine an adequate value of pg (associated to a well-defined clus- 
tering) is considering a clustering (a set of clusters) with minimum average intra- 
cluster similarity. 



4 The Method for Determining 

In this section, we describe the proposed method to obtain a threshold such that a 
well-defined clustering is associated to this value. 

4.1 Description of the Method 

The proposed method works in the following way. First, the threshold P^ is initialized 
with the minimum value of similarity between objects (y0„=O.O). After this, in order to 
generate several clustering, the method handles a loop, which increases the threshold 
value with a small constant (INC) and then, a clustering using the GLC algorithm is 
generated with this P„ value. For each cluster in a clustering, the maximum and mini- 
mum values of similarity among objects are calculated and the Intra_i criterion is 
computed. Finally, the method calculates the average Intm_i for each clustering and 
takes the minimum value obtaining the threshold Pg that generates a well-defined 
clustering in the data set. This process continues until INC reaches the maximum 
similarity value y^„=1.0. 
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The increase value (INC) for y^„can takes several values, depending of the accurate 
required in the problem (0.1, 0.15, 0.5, 0.01, etc., for example). Two ways to handle 
this parameter in our method are proposed. The first option simply consists in increase 
INC until reaches the maximum similarity value. The second option is proposed for 
similarity functions that depend on comparison functions for features (^, t=l,...,n) that 
have the form 

,Oj)= {r, / X, G /?, X, (O. ), X, (Oj )are similar according to } n 

where n is the number of attributes. In this case, the value of INC is fixed as 
INC=1.0/n, if [0.0, 1.0]. In this way, the values for y^„among 1.0//z and 1.0/(/z+l), 
h=\,...,n-\ do not generate any change in the clustering. 

The method proposed in this paper is the following: 

Input: Q (data set), INC (po increase value) 

Output: po (threshold calculated) 

po= miUs 

Repeat 

CCk=GLC(po) 

Cii^=Clusters (CCi^, i) , i = l,...,hk 
max,,i_ii^=Max (Cii^, i) , i = l,...,hk 

minci-ik=Min (Cik, i) , i = hj, 

valueii^=Intra_i (Cii^, max,,i_ik , min,,i_ik) 
meanCCk=X valueik / | CC^ | 

Po = Po + INC 
until po= maxg 

po=po_min_average_Intra_i (meanCCk) 

The GLC(y0„) function calls to GLC algorithm in order to build a clustering (i.e. CCJ 
with a specific similarity threshold applying the connected components criterion. 
The function Clusters(CCj, i) returns the cluster i, from the clustering CC^. The func- 
tions Max(C;j, i) and Min(C;j, i) return the maximum and minimum values of similar- 
ity among objects for the cluster /, in the clustering CC^. The Inlm_i{C^^, max^.^, 
min^, ^) function calculates and return the intra-cluster criterion for the cluster i. |CC^| 
denotes the number of clusters in a clustering. Finally, 
P,^min_mean_Intra_i(meanCCj) obtains and return the minimum average value of the 
IntraJ. criterion. 

The threshold pg obtained by the method indicates that the clustering associated to 
Pgis a well-defined clustering. 
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5 Experimental Results 



In this section, two examples of applications of the method to data sets and large data 
sets are presented. 

The first data set (DSl) contains 350 numerical objects in 2D, and it is shown in 
figure 1(a). The clusters shown in this figure have several shapes, sizes and densities, 
and they are not lineally separable. The method was applied to DSl and 7 clustering 
were obtained. The thresholds Pg obtained in each clustering (CCj) are as follows: 

CCj! (3„=0.00; No. Clusters = 1; Average Intra_i= 0.8631; 

CC^: (3„=0.89; No. Clusters = 2; Average Intra_i= 0.4777; 

CCj! p„=0.90; No. Clusters = 3; Average Intra_i= 0.3095; 

CC^: p„=0.93; No. Clusters = 4; Average Intra_i= 0.2772; 

CC,: p„=0.94; No. Clusters = 5; Average Intra_i= 0.2378; 

CCg'. p„=0.98; No. Clusters = 86; Average Intra_i= 0.6418; 

CC,: p„=0.99; No. Clusters = 350; Average Intra_i= 1.0000; 

The minimum value of these averages determines the value y^„=0.94, which corre- 
sponds to a well-defined clustering (i.e. CC5), formed by the clusters shown in the 
figure 1(b). The same well-defined clustering was obtained in [5]. For this example, 
the value INC=0.01 was employed. 
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(a) 

Fig. 1. (a) The objects corresponding to DSl; 
(well defined clustering discovered) 




(b) 

Clustering obtained with p=0.94, for DSl 



The second data set used for experimentation was a Mushroom database [11]. The 
mushroom data set (a LDS, according to our definitions) contains records with infor- 
mation that describes the physical characteristics of a single mushroom (e.g. color, 
odor, shape, etc.). This data set contains 8124 records. All attributes are categorical, 
and contain missing values. Each record also contains a poisonous or edible label for 
the mushroom. 

In order to show the behavior of the proposed method, several clustering were ob- 
tained with their respective Pg and them are presented in tables 1 and 2. Again, we 
show the well-defined clustering generated for DS2, which corresponds with a 



Determination of Similarity Threshold in Clustering Problems for Large Data Sets 



617 



y^„=0.9545 value, generating 23 clusters, with an average Intm_i value (AIV) of 
0.2115. The same well-defined clustering was obtained in [9]. For this experimenta- 
tion the value INC=0. 0454= 1.0/22 (number of features = 22) was used. 

The cases with /?„=0.00, AIV=0.8182 and P=Q.99, AIV=1.0 are not shown, because 
in the first case the clustering obtained has all the objects. And, for the second case 
each cluster contains only one object. 

The notation handled in tables 1 and 2 is as follows: CN denotes the cluster num- 
ber; NE indicates the number of edible mushrooms; NP denotes the number of poison- 
ous mushrooms, and AIV indicates the average Intm_i value. 

The experiments were implemented in C language on a personal computer with 
Pentium processor at 833 Mhz and 128 RAM Megabytes. 



Table 1. Clusters obtained for DS2 with |3„=0.6810; |3„=0.7273; and |3„=0.7727; 
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6 Conclusions 

The method proposed in this paper allows obtaining a well-defined clustering, based in 
an intra-cluster similarity criterion. 

The method gives a threshold value Pg to obtain a well-defined clustering for large 
data sets. 

The method does not establish any assumptions about shape, size or cluster density 
characteristics of the resultant clusters in each generated clustering. However, the 
proposed method is still susceptible to noise. 

Our method uses the y^„-connected component criterion for clustering. As future 
work we will work in the generalization of the proposed Intra_i criterion in order to 
handle other clustering criteria as those exposed in [4]. 
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Abstract. Current technology allows the acquisition, transmission, stor- 
ing, and manipulation of large collections of images. A way to achieve 
this goal is the automatic computation of features such as color, texture, 
shape, and position of objects within images, and the use of the features 
as query terms. 

In this paper we describe some results of a study on similarity evaluation 
in image retrieval using shape, texture, color and object orientation and 
relative position as content features. A simple system is also introduced 
that computes the feature descriptors and performs queries. 



1 Introduction 

Current technology allows the acquisition, transmission, storing, and manipula- 
tion of large collections of images. Content based information retrieval is now a 
widely investigated issue that aims at allowing users of multimedia information 
systems to retrieve images coherent with a sample image [1]. A way to achieve 
this goal is the automatic computation of features such as color, texture, shape, 
and position of objects within images, and the use of the features as query terms. 

Content-based retrieval can be divided in the following steps: 

Preprocessing: The image is first processed in order to extract the features, which 
describe its contents. The processing involves filtering, normalization, segmen- 
tation, and object identification. The output of this stage is a set of significant 
regions and objects. 

Feature extraction: Features such as shape, texture, color, etc. are used to de- 
scribe the content of the image. Image features can be classified into primitive. 
We can extract features at various levels. 

The basic image retrieval system based on this concept is shown in Figure 1. 

The main difference between our system and other is the manner in which 
similarity between a query image and images in a database is computed. For 
query images, we first compute ROI (Region of Interest) and extract a set of 
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Fig. 1. Image retrieval system 



color, texture and shape features by applying color histogram computation, Ga- 
bor texture extraction and shape parameters computation. The query is pro- 
cessed using color features computation unit, next the Gabor texture unit uses 
as input the query results of the color features computation unit. Gabor texture 
unit compares the texture information of the images and discards the images 
whose color information is similar to that of the query image but the texture 
information is much different from that of the query image. Next, the shape 
parameters computation unit is applied to the query results in this stage. The 
last step is the final query results. 

2 Feature Extraction 

2.1 Color Features 

We propose a new color feature called color correlogram which describes the 
global distribution of local spatial correlations of colors and the size of this 
feature is fairly small [2,3,4]. 

For a pixel p = (x,y) G F, let F{j>) denote its color. The histogram /i of F is 
defined for i G [c] where c is number of colors e.g. ci, . . . , Cc as 

hc,{F) = N ■ M Pr [p G F,,] (1) 

p&F 

gives the probability that the color of the pixel is Ci. 
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The correlogram of F is defined for i,j G [c],k G [d] as 

(P) = Pr [P2 e Fe I \PI-P2\ = k] 

P2&F 



(2) 



Given any pixel of color Ci in the image, 7ci4- gives the probability that a 
pixel at distance k away from the given pixel is of color Cj. To compute the 
distance between images F and F’ we compare histograms and correlograms 



\F 



p'l _ \f^cj{F) — hcj{F )| 



\F 



f'U= E 



kG [d] 



\ii%{F)-ji':i{F')\ 
l + li%{F) + li%{F’) 



(3) 

(4) 



Given the histograms for a template T and an image F, the intersection of 
these two histograms is defined as 



H,,{T n F) = mm{H^,{T),H,,{F)} 



(5) 



and 

K^TnF) 



HcATr^F) 

^ m ' 



(6) 



The intersection correlogram is defined as the correlogram of the intersection 
T n F . The intersection correlogram is defined as 



ii%{TnF) 



Fifi,(PnF) 

H^,(TnF)-8k 



(7) 



where 

F^l (F) = \{piG F,, , p2 G Fe, I \p,-p,\ = k}\ (8) 



2.2 Gabor Features 

The differential structure of an image is completely extracted by the convolu- 
tion with the Gaussian filter family. We use Gabor filters in our image retrieval 
system. This results in the family of Gabor filters covering the total spatial fre- 
quency plane nearly uniformly. Filtering an image with Gabor kernel can be 
interpreted as local Fourier analysis. The known good characteristics of Gabor 
filters for image analysis can be justified in scale space framework [5,6]. Gabor 
filters are used in analysing the property of an object in the selected image be- 
cause they have optimal joint localization (resolution) in both the spatial and the 
spatial frequency domains. The frequency tuning of filters allows an axiomatic 
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characterization of Gabor filters being the linear, shift invariant family of trans- 
formations which is (i) parameterized by a scale parameter with a semi-group 
structure, (ii) is scale invariant i.e. the function that relates the observable is 
independent of the choice of dimensional units. 



Gabor functions are Gaussians modulated by complex sinusoids. In its gen- 
eral form, the two-dimensional Gabor function and its Fourier transform can be 
written as [7,8] 

h{x, y) = g{x, y) exp{j2nFx) (9) 



F the radial center frequency and g{x,y) is the 2D Gaussian 



g{x,y) 



1 

(7 xCr y 



exp 






( 10 ) 



where Uy) characterize the spatial extent and bandwidth of Gabor filter 
h{x,y). 

The aspect ratio of g{x, y) is given by A = > which gives a measure of 

filter’s symmetry. In the frequency domain. 



H{u, v) = exp 27 t^ct^ (it — F)^ + (u)^j | 



( 11 ) 



The set of self-similar Gabor filters is obtained by appropriate rotations and 
scalings of through the generating function: 



gmn{x, y) = a ™ 5 (a;', y') a> 1 m,n= integer 



( 12 ) 



where 

{x' , y') = (a“'"[a;cos0 -I- ?/sin0], a“™[— a;sin0 -|- ycosO]) (13) 

where a is the scale factor, n = 0,1, . . . , K — 1 is the current orientation 
index, K is the total number of orientations, m = 0, 1, . . . , S' — 1 is the current 
scale index, S is the total number of scales, and 9 = 



The scale factor a ™ in equation (12) ensures that the filter energy is inde- 
pendent of m. 



E 



mn 



\gmn{x,y)\'^dxdy 



(14) 



j ^ — fh 1 fi £md ffi are the lower and upper center 
frequencies of interest. In our implementation fi and fh are equal respectively 
fi = 0.05 and fh = 0.4 and a = 2. 
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Gabor filtered output of the image is obtained by the convolution of the 
image with Gabor function for each of the orientation/spatial frequency (scale) 
orientation. Given an image I{x,y) , we filter this image with gmn{x,y) 

Gmn = '^'^I{x - k,y - l)g^^{x,y) (15) 

k I 

where * indicates the complex conjugate. 

After applying Gabor filters on the image we obtain an array of magnitudes 
\Gmn{x,y)\ (16) 

X y 

The magnitudes of the Gabor filter responses are represented by three mo- 
ments: 



- the mean 



M N 



gmn 



TN EEG-n(.,y) 



MN 



x—1 y—1 

- the standard deviation (7^ 



M N 



\ EEiiG mn (x,y)\ - fj-ri 

\ x=l y=l 



- the skewness k. 
1 



MN 



mn 

M N 

EE 



Gjrinix^y^ fkj] 



x—1 y—1 

The feature vector (FV) is represented as follows 
FV = . . . , y.sK, <XsK, i^Sk] 



(17) 



(18) 



(19) 



( 20 ) 



The similarity of a query image Q and a image T in the database is defined 
as T where 



cr, At) = ^ ^ 

m n 

where 

= 



( 21 ) 



( 22 ) 



where r:(/i^„), and r:(K^„) are respectively mean ,the atandard 

deviation and the skewness of the transform coefficients over the database. 



„(Q) ,,{T) 




NQ) AT) 




JQ) JT) 


^mn ^mn 


-k 


G mn G mn 
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•^mn f^mn 


(Mmn) 


^ i-^mn) 


“ (^mn) 



The number of scales chosen is ^ and orientations are 6. Thus 2^ Gabor 
filters are used in the experiments, which give J^8 dimensional feature vector 
texture classification. The proposed features are found to give 94-35% correct 
classification rates. 
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a) b) c) d) e) 



Fig. 2. Texture image a). The power spectrum of the Gabor transform with 0° (re- 
spectively b) and d)) and 60° (respectively c) and e)) orientation for various scale. 



2.3 Shape Features 

Basically, shape based image retrieval is the measuring of similarity between 
shapes represented by their features. Shape is an important visual feature and 
it is one of the primitive features for image content description. However, shape 
content description is a difficult task because it is difficult to define perceptual 
shape features and measure the similarity between shapes. To make the problem 
more complex, shape is often corrupted with noise, defection, arbitrary distortion 
and occlusion. Therefore, two steps are essential in shape based image retrieval, 
they are, feature extraction and similarity measurement between the extracted 
features. 

To characterize the shape we used following descriptors: principal axis ratio, 
compactness, circular variance which are translation, rotation and scale invariant 
shape descriptors, and seven Hu moments [9,10]. 

The principal axes ratio (par) 



^yy ' ^xx 



par = 



\ K^VV ^xx)'^ 4( 



CxxCyy L-xy 



- c2 ) 

^xy) 



^yy ' ^xx 



4“ \K^vv Cxx)^ 4( 



CxxCyy L-xy 



- c2 I 

^xy) 



(23) 



where covariance matrix G of a contour is defined 
C = 



-'xx ^xy 



(24) 



Compactness ( comp ) is the ratio of the perimeter of a circle with equal area 
as the original object and the perimeter of original contour 



comp = 



^circle -^circle'^ 



P 



P 



(25) 



Circular variance (cv) is the proportional mean-squared error with respect 
to a solid circle 



1 






cv = 



(26) 
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where N is the number of contour points, pi = {xi, is the contour point, 
pL is the centroid and pr is the mean radius of the contour. 



A object can be represented by the spatial moments of its intensity function. 
In the spatial case 



= (27) 

x—\ y—1 



The central moments are given by 

m n 
x—1 y—1 



where (X, Y) are 
mio 



moi 



X = — — and Y = 

moo Woo 

Normalized central moment Ppq 

iTipq P+q , , 

^"""(moo)“ 2 



(29) 



(30) 



Using nonlinear combinations of the lower order moments, a set of moment 
invariants (usually called geometric moment), which has the desirable properties 
of being invariant under translation, scaling and rotation, are derived. Hu [11] 
employed seven moment invariants, that are invariant under rotation as well 
as translation and scale change, to recognize characters independent of their 
position size and orientation. 



= P20 + M02 
<(>2 = [P20 — ^02]'^ + 

<('3 = [M30 ~ 3^02]^ + [3tt21 — f^03]^ 

<(’4 = [P30 + Pl2]^ + [P-21 + P-03]^ (31) 

(l>5 = [P30 — 3/il2][M30 + P 12 ] X [{p30 + P 12 Y ~ 3(/i21 + M03)^] + 

+ [3/421 — /t03][/421 + P03] X [3(/t30 + /iia)^ ~ {P21 + P03)‘^] 

4>& = [P20 — P02][{P30 + P12Y ~ {y-2,1 + /403)^j + ^Pll[P30 + Pl2][P2l + P03] 

4>7 = [3/421 — /4o3][/430 + P12] X [(/430 + P12Y — 3(/421 + P-33Y] 

— [/4Q3 — 3/4i2][/421 + P 03 ] X [3(/430 + /4l2)^ ~ {^21 + /403)^j 

To characterize the shape we used a feature vector 

SFV = {4>i,(j)2, ■ ■ ■ , (j)T,par, comp, cv) ( 32 ) 

consisting of the seven moment invariants, principal axis ratio, compactness 
and circular variance descriptors. This vector is used to index each shape in the 
database. The distance between two feature vectors is determined by city block 
distance measure. 
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3 Conclusion 

A retrieval methodology which integrates color, texture and shape information 
is presented in this paper. Consequently, the overall image similarity is devel- 
oped through the similarity based on all the feature components. Experimental 
evaluation based on our image database shows that our method promisingly 
outperforms the retrieval systems from the literature. 
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Abstract. A document analysis prototype and its application to the automatic 
Portuguese cadastral map digitalisation is discussed in this paper. Tuning off the 
shelf methods and sometimes their extension has permitted to obtain applicable 
results. These algorithms and their tunings as well as the results obtained are 
given in the paper. The prototype has been approved for further development to 
an integrated system to be used by some Portuguese entities. 

Keywords: Cadastral Information System, Map Analysis, Image Processing 



1 Introduction 

In the past years many administrative entities decided to transfer the cadastral information 
to a numeric format and started using electronic management systems. This is done 
manually, slowly, and expensively. More than 100000 sheets exist to digitalise in Portugal 
- one of the smallest countries in Europe. 

The process is iniciated by a binary scan with a resolution of at least 300dpi. This pro- 
vides all the necessary information since our map doesn’t contain colors or gray scales. 
The Portuguese cadastral map is constrained to several rules, that maintain uniformity 
explored by our system. Every entity is composed by a closed contour, a numeric iden- 
tification inserted in a parcel circle, possible dependent plots, separation lines between 
parcels and a limited word description of each parcel (see figure 1). These are the main 
guide lines used by our algorithms. 

The Portuguese cadastral map authorithy is pleased with results, that can improve 
the digitalisation, not only by time saving but also accuracy, since the manual entry of 
an entire map is tiering and error- prone. 

2 Processing Methods 

The use of various methods for this analysis are mainly conditioned by the enormous 
computational effort necessary to complete the task, due to the quantity of information 
(each map has around 500 entities, 800 parcels, 2500 characters and a significant number 
of miscellaneous information). 

Similar problems were also discussed before [1,2], but to obtain a robust application, 
all the formal aspects of the Portuguese cadastral maps need to be reconsidered (see 
section 1). 



A. Sanfeliu and J. Ruiz-Shulcloper (Eds.): CIARP 2003, LNCS 2905, pp. 627-634, 2003. 
© Springer- Verlag Berlin Heidelberg 2003 



628 



T. Candeias, F. Tomaz, and H. Shahbazkia 




Fig. 1. Overview of a cadastral map sample. This typical map is about 85cm x 75cm representing 
138 ha of real land. The smaller portion of the map represents 4cm x 4cm @ 300dpi on the paper 
map. 



The legal status of cadastral administration imposes full robustness therefore the 
applied methods are largely known, tested and are also necessarily fast and accurate [3]. 

2.1 Geo-Referenced Crosses 

These crosses are present for reference to the real world position of the map. They 
are present in small numbers, normally well formed (almost no rotation or distortion), 
well distributed (constant space among them), making it’s recognition and extraction 
very simple to perform, through template matching of perpendicular lines are scaled in 
function of the image DPI’s. It easily obtains results near 100% for recognition, and only 
a small percentage of the map is damaged while applying our neighbour algorithm for 
removing. 

2.2 Circles 

Some problems are associated with circle detection and removal in a common cadastral 
map: existence of semi-circles, different scales and connection with elements defined 
as linear (see section 1). This recognition is very important because every parcel is 
dependent of it’s circle recognition. 

Hough transform [4,5] is a known process for extraction of parametric defined shapes 
even in the presence of noise or if the patterns are sparsely digitilised. Some changes 
were made to the original algorithm, and now we are using a simple, but efficient method 
similar to the algorithm presented in [6] . 

This method also covers the recognition of semi-circles, due to the initial element 
labeling (see section 1). To prevent the necessity of large storing and searching associated 
with Hough’s algorithm, local decision is performed for each space point. This however 
results in a strong circle mismatch recognition near a "true" circle (see figure 2). 

This can be easily solved with a post-treatment searching and confirmation algorithm 
based on the known characteristics of the map and circles, which in most cases are in 
conformity with a standard. So the peaks represented in figure 2, can be filtered and 
chosen through its parameters (distance, radius and power). In the example above, only 
one circle will be filtered due to the proximity of the other. 
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Fig. 2. Hough transform, the local decision accepts 2 circles centers 



The results are very good, therefore that proves a working solid algorithm. Indeed, 
in respect to the cadastral standards, we can recognise correctly, for a general map 
(containing about 300 circles and 90 semi-circles) about 94% of circles and 90% of 
semi-circles. The percentage of detected, but inexistent circles are below 0.02% in most 
maps. 



2.3 Dash 

The recognition and extraction of dash elements are imperative for the overall result. 
Similar to the contour of parcels (see section 2.5) the dash represents inner borders of 
different land applications. The main difficulty is the similarity of the dashes to other 
components of the map, noise lines or parts of alpha-numeric elements. A real set of 
dashes can be easly mistaken with other elements from the map due to it’s size, shape 
or context, even to a human’s perception. 

A density calculation is applied, after a normalization rotation using central mo- 
ment [7]. In this way, all elements which are not linear and dense are filtered out. A 
neighbourhood check is also performed to ensur that the dash appears locally in a al- 
most constant frequency. 

It’s also evident that this method cannot recognise all the dashes (corners or rounded 
dash), but most important, non-dash elements aren’t labelled as a dash by mistake. A 
pos segmentation is used to join the dash centers and obtain the inner parcel’s contour. 



2.4 Symbol Recognition 

Symbol recognition is important to identify parcel’s type, its number and also to detect 
important out of the analysing zone such as the total number of parcels. 

The symbols to be recognised are digits and some characters, which can have different 
sizes. The small set of characters is due to the limited set of words, that describes parcel’s 
type. After analysing different feature extraction methods [8], it was chosen to implement 
the zoning algorithm [9]. 
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A database was done for all possible characters which are free of noise. The database 
can have repeated elements to ensure correct recognition, so it’s important to consider 
the database’s consistency. This is obtained only allowing addition of a new symbol if 
its distance to any other is at least 10%. 

The classification is implemented using a distance algorithm between two patterns, 
which measure their dissimilarities. This method is implemented comparing similar 
squares in patterns - zoning. These squares are found considering the proportional squares 
inspecting pattern size. Analyzing and comparing each square a punctuation is obtained 
that could increase or decrease its match. A big size difference between patterns could 
also decrease its punctuation. 

Using this local matching instead of a global approach, it’s possible to obtain an 
invariant size method. 

After applying the distance algorithm between testing symbol within each pattern 
in database, a list is obtained with all matching percentages. This list is later used to 
enhance the correct hit rate using a cadastral dictionary. 



2.5 Contour Detection 

As in section 2.3, the countour extraction gives a list of points which represent the parcel’s 
coordinates. These points are a list of line segments which constitute a closed polyline. 

The contour extraction is composed by three stages: detection, correction and vec- 
torization. The contour lines can have two types: continuous or dashed. Each parcel 
is separated by continuous lines while dashed lines split sub-parcels. The process of 
detection and correction is different in each case while vectorization is the same. 

The precision of extraction is important and must be considered. To increase pre- 
cision, two neighbour parcels may have the same common segment line. To contribute 
for robustness of all the extraction process, a contour is represented by its medial axis 
points [10], otherwise the line segment wouldn’t be equal. So the algorithm of contour 
extraction is applied to the processed medial axis image. 



Detection. The main problem of contour detection is the existence of discontinuities 
which are provided by bad scanning or by noise effect. Two strategies are possible, 
restore discontinuities or use algorithms which aren’t sensible to it. To restore the line 
following algorithms [11] could be used. But using this kind of algorithm other problems 
also appear, for example when there are interceptions of lines, the question is which one 
to follow? Such type of solution would give decision problems that slow down the 
process. In this way authors have decided to use algorithms which are insensible to 
discontinuities. 

Initially active contour models [12] were considerated, but due to the lack of different 
energy field, it wasn’t possible to apply. As every parcel contains only one circle and this 
information is reliable, the creation of an algorithm with no sensibility to discontinuities 
and initial knowledge of a point inside the contour made sense. 
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The first attempt to solve this problem is a fill algorithm, starting in a point inside of 
a parcel and then coming to all critical squares^ of the contour while filling square after 
square. 

This is the perfect case which represents the application in an image that was every 
contour closed. But this isn’t the case, so instead of applying a normal fill, it’s applied a 
fill with blocks reaching insensibility to discontinuities. This means that discontinuities 
which are smaller or equal to a block square are the perfect case while others can cause 
problems. Of course that block size is variable but increasing or decreasing it can cause 
problems in any parcels of the map. 

Another approach is to use a quad tree based algorithm for the parcel segmentation. 
The main idea is similar to the fill algorithm but now squares are of different sizes. 
The algorithm starts by finding the biggest square inside a parcel in such a way that no 
collisions exists with black pixels. 

The biggest square is filled and, in an interactive way a 4-neighbourhood expansion 
occurs with the same squares size. Each new empty square is filled until there aren’t 
more possible empty square with the same size. Then for each non empty squares a split 
is applied in four equal sizes, and this process continues interactivally. 

This improving is important because it gives sensibility to a square size and discon- 
tinuities can be detected because it’s possible to know square size at each interaction. 
The algorithm stops when a larger square, than the current processing one, is detected 
and empty of contour pixels. 

After the detection of critical squares is complete, a detection to contour points is 
done. This is implemented considering the filled neighbours squares and detecting all 
contour points in the opposite direction. After all these procedures are complete, a list 
of points is obtained for each parcel that represents its contour. 

The dash’s lines which also represent a contour line are detected considering each 
element already recognised (see section 2.3). 

Correction. After detecting the contour points some discontinuities are found. These 
discontinuities are due to the Delusions already existent and so, are necessary to correct. 

The contour line of continuous type is corrected using a line follow algorithm [11], 
making a linear interpolation when there are no neighbour pixels to follow. 

The dashed line is corrected in a more complex manner. The extreams of the detected 
dashes are introduced in a list, then a linear interpolation is done to the near extreams, 
that can’t make part of the current dash. After uniting all the dashed line components, 
the contour that is united to the dashes can be erased. This erosion is done detecting the 
extremes that didn’t interpolate (absolute extremes). 

Vectorization. Before applying the vectorization method described in [13,14] its nec- 
essary to convert the contour points into chain code. 

Once the contour is made by line segments, the chosen method was Rosin & West, 
considering the split-and-merge approach. 

An example of contour extraction can be seen in figure 3. 

' The critical squares in this case are squares which have black pixels, so there can be known 
that these are squares with contour points. 
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(a) Sample image before 
processed. 




(b) Result from contour ex- 
traction. 



Fig. 3. Example of the extraction process of a parcel’s contours. 



3 Results 

The results were obtained testing each module with a binary cadastral map, 10784x7853 
pixels @ 300 dpi. The time was measured with C internal language’s function in a Athlon 
1400 processor, 512 Mb DDRam and a HD 7.200 rpm. 

The program was compiled with gcc version 3.2 (Mandrake Linux 9.0 3.2-lmdk) 
with CFLAGS = -02 -march=i686 -mcpu=i686 -funroll-all-loops. 

The results can be observed in table 1 . 



Table 1. Experimental results of application’s modules. 



Element 


Number of patterns 


Performance rate 


Processing time 


Crosse 


35 


100% 


2:23 min 


Circle 


386 


94% 


10:16 min 


Semi-circle 


94 


90% 


14:32 min 


Symbol 


4657 


77% 


12:56 min 


Dashe 


2347 


84% 


4:26 min 


Contour 


450 


82% 


30:54 min 



The patterns were first manually and then automatically classified to obtain the hit 
rates. Contour results are qualitative, so each vectorized parcell was compared to the 
initial map. 

Global recognition requires 1 hour and 14 min. 



4 Discussion 

The results obtained at the moment are satisfactory, but can still be improved. In every 
analysis there are some problems which are resolved increasing consequently the level 
of the overall results. 
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The circle detection is only problematic when processing small circles. This happens 
because the size of non recognised circles are out of the processing range. This can be 
solved increasing the processing interval but it will be more time consuming. 

The semi-circle detection needs the Hough transform to restore lines, which is com- 
putationally expensive. Some problems occur in restoring a parcel’s contour related to 
the size of the Hough transform window. 

Symbol detection is the most problematic because there are many different classes 
to classify. The problem is globally solved because each symbol can be classified alone 
in a correct way. Problems occur when symbols are connected. 

The dash detection is also globally solved, but continue problematic when connected 
to symbols. This may be solved by splitting unknown elements as in the case of symbols. 

The contour extraction could be problematic due to previous processing. Other prob- 
lems occur when there are two closed discontinuities which are removed after applying 
a linear element detection. This could be solved applying a filter using size and central 
moments to non-linear elements to restore them close to the contour, in this way long 
discontinuities are removed. 




Fig. 4. Snapshot of the prototype running 



5 Conclusions 

A full assessment of the results only can be carried out when the system goes is op- 
erational. But the simplicity of the prototype and the methods used have proven their 
efficiency in solving the given problem. The use of the meta-knowledge from the begin- 
ning of the analysis makes the system dedicated to the task. In general we can consider 
an automatic digitalisation system as a success if it comes to extract 80 % of the data 
with total robustness. Nevertheless low-level problems such as connected component 
labelling, cadastral-section boundary extraction and road labelling as well as high-level 
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problems such as semantic consistency and linking of sections remain a hard task to be 
addressed. A snapshot of the prototype can be seen in picture 4. 
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Abstract. We present an approach to color image segmentation by applying it to 
recognition and vectorization of geo-images (satellite, cartographic). This is a 
simultaneous segmentation-recognition system when segmented geographical 
objects of interest (alphanumeric, punctual, linear, and area) are labeled by the 
system in same, but are different for each type of objects, gray-level values. We 
exchange the source image by a number of simplified images. These images are 
called composites. Every composite image is associated with certain image fea- 
ture. Some of the composite images that contain the objects of interest are used 
in the following object detection-recognition by means of association to the 
segmented objects corresponding “names” from the user-defined subject do- 
main. The specification of features and object names associated with perspective 
composite representations is regarded as a type of knowledge domain, which 
allows automatic or interactive system’s learning. The results of gray-level and 
color image segmentation-recognition and vectoriztion are shown. 



1 Introduction 

Segmentation is fundamental to the field of image processing because it is used to 
provide the basic representation on which understanding algorithms operate. The abil- 
ity to build up a representation from individual pixels of an image, which exploits 
relationships such as local proximity and highlights the structures of the underlying 
components, is important for the extraction of features during interpretation and rec- 
ognition [1]. In general, the nature of this representation is application dependent. In 
the present work, we developed an application independent segmentation. 

Up to the now a great variety of segmentation algorithms for gray-level images has 
been proposed. The majority of color segmentation approaches are based on mono- 
chrome segmentation approaches operating in different color spaces [4]. Gray-level 
segmentation methods can be directly applied to each component of a color space; 
thus, the results can be combined in some way to obtain a final segmentation result. 
However, one of the problems is how to employ the color information as a whole for 
each pixel. When the color is projected onto three RGB color components, the color 
information is so scattered that the color image becomes simply multispectral image 
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and the color information that humans can perceive is lost [2]. Another problem is 
how to choose the color representation for segmentation [3], [4]. There is no single 
color representation that can surpass others for segmenting all kinds of color images. 
The use of nonlinear spaces, such as HSI and the normalized color space can solve the 
problem to certain approximation. However, the nonlinear spaces have essential, non- 
removable singularities and there are spurious modes in the distribution of values [5]. 

An alternative solution presented in this work is invariant image representation 
(composite images, or simply composites) that does not depend on the choice of par- 
ticular color space. The processing of a color image is individual segmentation by 
each color component into image meaningful (or invariant with respect to a given, 
unnecessary color feature) regions, first and, then - image’s joint segmentation- 
recognition (or “objects of interest designing”). Moreover, the prescribed set of fea- 
tures is regarded as a type of knowledge domain. The composite image technique in- 
cludes object-fitting compact hierarchical segmentation, binarization of segmented 
images, and synthesis of binary representations. The main goal of image synthesis 
consists of the object linking by its associated names. In the following sections, we 
build up composite image representations based on object-fitting compact hierarchical 
segmentation. See also [6], [7], [8], and [9]. 



2 Object-Fitting Compact Hierarchical Segmentation, 

Recognition, and Vectorization 

In our method, the image segments obtained as the result of the iterative procedure of 
successive increasing of the admitted gray-level and color thresholds in the segment 
merging form subsequently increasing compact hierarchical structure of the flat seg- 
ment networks. Each segment of this structure can have the ancestor or a descendant. 
Thus obtained structure is called the adaptive dynamic data structure. The segment of 
image is a node of the spatial structure, which attributes are primary numbers defined 
by the averages of color/gray-level segment’s features and by a set of pixels that rep- 
resent the area and the shape of the segment (Section 6). This allows organizing the 
object-oriented identification of semantically meaningful image’s regions. Our system 
has the interactive procedure of compulsory re structuration of the segment relation- 
ships as a tool of the semantic analysis of visual data. In other words, the system’s 
learning and self-learning with the prescribed set of associative identifiers are possible 
in the interactive regime. 

Successive segment merging by some criteria leads to the segment structuring in a 
multi-level hierarchy that represents by the dynamic trees [7]. This hierarchy or the 
multi-level image partition is an efficient method of semantic identification of the 
image’s objects. The relationships between the dynamic tree nodes indicate the neigh- 
boring semantically meaningful regions. Because the image’s regions are identified 
by the corresponding tree nodes, the neighbor relation between them can be com- 
pletely defined by a table of adjacency. A modification (elimination of some edges, 
i.e. segment relationships) of the dynamic tree allows modifying the resulting region 
and thus more exact object detection is reached. Each level of the tree of segments can 
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be considered as alternative image interpretation in different semantics (see Fig. 3, 
Section 3 and Fig. 5, Section 5). 

Adaptive dynamic tree structure regards the search for meaningful objects as the 
combination of the object features that fit to the corresponding ranges and the follow- 
ing analysis of all admitted areas. This makes possible to use the automatic learning 
algorithms when the set of searched objects is given and it is necessary to define only 
the corresponding feature ranges (this is natural supposition for geo-images [9]). The 
learning process can he organized as follows. The user selects the appropriate level of 
segment hierarchy and points out the set of the suitable areas. These areas can be de- 
fined by combining the corresponding segments. Then the program computes the 
characteristics of the located segments and relationships between them and establishes 
the formal criterion of the search for the similar objects. 

Object-fitting compact hierarchical segmentation is a sequence of embedded parti- 
tions without repetition of composed segments in different partitions. A partition is 
obtained by iterative segment splitting or merging. In the merging mode, any segment 
in each iteration merges into the nearest adjacent segment. The number 2', where i is 
the number of iteration, is bound total number of segments N, generated at each itera- 
tion [6]. The number A has to be taken into account for automatic color image analy- 
sis. Indeed, the merging of segments into objects defines the image semantics. The 
image’s semantics in this context corresponds to the association of segment fields of 
different hierarchical levels being identified with identifying conceptions from the 
subject domain. For example, detection of a segment identifying a coastline or high- 
way becomes semantically meaningful. Further, this set of segments is renamed as 
“coastline”, “highway”, etc. (Fig. 1). 

Segmented and recognized objects are subsequently vectored by applying a method 
described in [9] to be finally included into GIS. These three stages (segmentation- 
recognition-vectorization) are Objected Oriented Data Integration for GIS [9], [12]. 




(a) (b) (c) (d) 

Fig. 1. (a) Source image, (b) Segmented image, (c) Recognized image, and (d) Vector image of 
the river 



Precisely all this is what we mean under simultaneous segmentation-recognition- 
vectorization process. Fig. 1 shows segmentation-recognition-vectorization of the river 
in a SAR image of Kalimantan Island. 

The number of segments N decreases approximately as (4-^5) ', where i is the itera- 
tion number (Fig. 2a). From our point of view, deviation from this exponential de- 
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pendence leads to image’s semantics violation. Disclosed regularity can be useful for 
automatic analysis of gray-level and color images. 




Fig. 2. Linear dependences: (a) Number of segments N on iteration number i, (b) Compressed 
image volume V on iteration number i 



Our experiments have shown (Fig. 2b) that the compressed volume of an image is 
decreased in the same exponential mode (Cf. image compression with information- 
lossless standard algorithms: RAR, LZH, etc.). 

Eliminating the dependence on iteration number i, we can obtain the exponent- 
mode relation of compressed data volume V on number of segments N as follows: 

NN, = (VVj 0) 

In equation (1), N„, V„ denote number of segments and compressed volume of the 
source image respectively; a is some real coefficient. We obtained that in the case of 
object- fitting compact hierarchical segmentation the exponent oris approximately 2.9. 
Note that for non-adaptive pyramidal segmentation [3] a is approximately 1.4. It is 
known that the volume of compressed data is closely related to the amount of infor- 
mation into data. Thus, a theoretical explanation of the obtained experimental depend- 
encies (Fig. 2) represents an interesting research topic of Pattern Recognition. 



3 Composite Image Representation 

We have found that in addition to natural decomposition (e.g., R, G or B - component 
splitting) of color images, artificial representations can also be useful for objects of 
interest detection-recognition [6], [7], [8], and [9]. Our approach provides composite 
representations of the source image by means of reduced number of color or tone 
components and segments. Composite image representation is a sequence of binary 
representations, which are packed into different bit planes. These binary images are 
the result of two-valued classification of source image by some feature (intensity, 
area, invariant moments, etc.; Section 6). 

A bit component of composite image (Fig. 3) computes by means of global dy- 
namic thresholding of the current segmented image. The threshold is equal to the 
average all over the image intensity, geometric or other feature, denoted by f\ To 
threshold the image, is compared with its average over the pixels of each segment, 
denoted by as follows: > (<) ^ where ^ is a tuning parameter. 
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Fig. 3. Bit components of composite images obtained by means of dynamic adaptive threshold- 
ing of Lena’s source and segmented images 



To compose these images, we also used the geometric features from the feature set 
(Section 6) in addition to intensity feature. The bit components are packed in the 
resulting representation, where the extrema of intensity indicate the pixels associated 
with unchanged binary feature. Essentially, the composite images form a “book” in 
which the objects of interest can be found on appropriated page(s). Thus, a “page 
number” defines the method of thresholding and the tuning parameter 



4 Color Composites 

Compact hierarchical segmentation of a color image is performed by each independent 
color components (R, G, and B) considering these as semi-tone images. In this way, 
coinciding intensities of resulting R, G, and B composite images indicate the segments 
of equal color with respect to using feature. This can be used for invariant color image 
description. As a rule, compact hierarchical image segmentation implies that color 
segments are enlarged simultaneously in accordance to regularities presented in Sec- 
tion 2. Due to the self-consistence of /?GB-segmentation behavior, visual quality im- 
provement in composite intensities becomes available [6]. The method requires sig- 
nificant operative memory space. To overcome this disadvantage, we used special data 
organization in the form of irregular dynamic trees (Section 2 and [6], [7], [8], [9]) 
that provides optimal in memory space computing for the successive scanning of im- 
age scales. Due to data organization, a practical use of our program package does not 
require further algorithmic development. The user needs only to make adequate choice 
to carry out task features from prescribed feature set {knowledge domain). 



5 Applications of Composites 

Image two-valued classification (binarization) is one of the most important tasks in 
modern recognition methods. In the frameworks of the composite image technique, we 
obtained a few solutions for this task [6], [7], [8], and [9]. By applying composites, we 
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are able to extract cartographic data using R, G, and B-components of full-size color 
raster-scanned image (Fig. 4). 




Fig. 4. Cartographic pattern retrieval from a color map image 1082 x 1406 pixels (extreme left) 

Fig. 5 shows how our method insures object detection in the task of recognition of 
inclined digits embedded in graphics (note that this is old and very difficult problem 
that has been attracted much attention by image processing specialists [1], [4], [9]). 
This illustrates that each composite image contains machine-treatable bit-planes for 
target object detection and also purposeless bit-planes. Indeed, to effectively recognize 
the objects of interest, it is better to search these objects on appropriated bit-planes. 
Although, our system can generate some errors in interpretation, it is much more use- 
ful for the following understanding algorithms because its output is nearly recognized 
objects of interest. 












Fig. 5. Bit-planes suitable for digit recognition (left side) and other purposes (right side) 



6 Comments 

The presented approach exploits the user’s experience providing the knowledge do- 
main in the form of the prescribed feature-attribute set. This set contains a number of 
attributes and numerous features. The attributes are a primary set of segment charac- 
teristics estimated and dynamically stored for all image segments at any level of the 
composite image representation. This provides a full-value use of object-fitting hierar- 
chical segmentation. The features are numerical segment characteristics, which are 
obtained as output of data conversion, and are selected in function of the processing 
stage and the problem context. Thus, prescribed segment attributes are the following - 
1) Extrema of numerical characteristics: a) global (for the whole image), b) local (for 
a neighborhood of the segment); 2) Additive: a) integral intensity (the sum of pixel 
intensities), b) number of pixels, c) integral first and second moments computed with 
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respect to the origin; 3) Non-additive perimeter; 4) Description of the adjacent seg- 
ments in terms of binary relationships. 

These data provide an estimation of the intensity and geometric segment features: 
pixel intensity range, average intensity, invariant moments, parameters of linear sizes 
and shapes, etc. In this manner, the features used in generation of object-fitting hierar- 
chy of the segments can be different from the features used in object recognition [6], 
[7], [8], and [9]. As a rule, at first step of image processing, only intensity features are 
useful, because the source image pixels do not form geometrically meaningful seg- 
ments and objects. Consequently, up to reaching image invariant representation other 
features are used for object designing and recognition. To our knowledge, this is one 
of the first attempts to design a segmentation-recognition computer system for com- 
plex color images of arbitrary type (Cf. [10] and [12]). 



7 Conclusion 

The problem of how and to what degree the semantic information should be employed 
in image segmentation has led us to the conception of composite image representation 
for mutual object detection-recognition at low level processing. We conjecture that 
modern segmentation systems must support mutual object detection-recognition- 
interpretation, starting at low level, memorizing results at the intermediate level, and 
effectively communicating these results to the high level. The approach proceeding 
from this conjecture is called composite image technique. The idea is to prepare the 
source image as much as possible for subsequent high-level processing of image re- 
gions. In most of the existing color image segmentation approaches, definition of a 
region is based on similarity of color. This assumption often makes it difficult for any 
algorithms to separate the objects with highlights, shadows, shadings or texture, which 
cause inhomogeneity of colors of the object’s surface. Using HSI can solve this prob- 
lem to some extent, except that hue is unstable at low saturation. Some physics-based 
models have been proposed to solve this problem [4], [11]. 

We saw an alternative solution of the problem defining image regions by quantita- 
tive, qualitative, and nominal features (in addition to color feature), which on the 
whole render the user’s knowledge domain. We believe that this is a kind of advanced 
simulation of the human’s visual perception. However, it is necessary to emphasize 
that for optimization of labor-intensive program training a strong formalization of 
composite image technique is now required. We are under way to solve this problem. 

At the same time, automatic interpretation of color images presents certain diffi- 
culties for state-of-the art in image processing and also artificial intelligence. To date, 
it appears unrealistic to obtain fully automatic computer-based interpretation system 
free of errors [4], [9], and [12]. 

We believe that only a system approach to the problem can be fruitful. In the con- 
text of the present work, this means first, decomposition of source image by multiple 
hierarchical components to achieve a stable, accurate representation in the presence of 
degraded images. Second is the segmentation with mutual recognition of appropriate 
primitives (compression stage) and, if required, their vectorization to be directly in- 
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eluded into application-oriented database, e.g., GIS (application-dependent stage). 
Finally, there is the development of a unified knowledge-based trainable and self- 
trainable system with optimal human-machine interaction for color image treatment. 
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Abstract. We present an approach to identify some geomorphometrical charac- 
teristics of raster geo-images. The identification involves the generation of 
raster layers, topographic ruggedness and drainage density. The topographic 
ruggedness is used to express the amount of elevation difference between adja- 
cent cells of Digital Elevation Model (DEM). The topographic ruggedness is 
presented hy means of Terrain Ruggedness Index (TRI). The densities layers are 
obtained by Spline Interpolation Method. These layers are used to represent the 
amount of geographic linear objects. The algorithm has been implemented into 
Geographical Information System (GIS) - Arcinfo, and applied for a GIS of 
Tamaulipas State, Mexico. 



1 Introduction 

Geomorphometric analysis is the measurement of geometry of the landforms in raster 
images and has traditionally been applied to watersheds, drainages, hillslopes and 
other groups of terrain objects. In particular basin morphometric parameters attracted 
much attention from hydrologists and geomorphologists since watersheds have been 
used for analysis of different physical ecosystem processes [1]. The geomorphometry 
represents one set of recommended variables to analyze distribution and concentration 
of certain spatial objects. 

Nowadays, Geographical Information Systems are powerful and useful tools as 
means of information, visualization and research or as decision making applications 
[2]. However, contrasting with the traditional topographic map methods, the GIS 
methods are relatively easy to apply in a consistent way on large areas of landscape, 
because they allow summation of terrain characteristics for any region. They can be 
used to provide geomorphometric data and therefore insight the processes affected by 
terrain morphology for all types of mapping. 

Since the mid-1980s, with increasing popularity of GIS technology and availability 
of Digital Elevation Models (DEM), the potential of using DEM in studies of surface 
processes has been widely recognized [3]. New methods and algorithms have been 
developed to automate the procedure of terrain characterization [4]. DEM has been 
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used to delineate drainage networks and watershed boundaries to compute slope char- 
acteristics, and to produce flow paths [5]. In addition, DEM has been incorporated in 
distributed hydrological models [6]. 

DEM is playing an increasingly important role in many technical fields of GIS de- 
velopment, including earth and environmental sciences, hazard reduction, civil engi- 
neering, forestry, landscape planning, and commercial display. It is difficult to exag- 
gerate the importance of the DEM to geomorphology, because DEM may ultimately 
replace printed maps as the standard means of portraying landforms. The contour 
maps remain an important data source for DEM, although techniques for measuring 
elevation directly from satellite images have been introduced in recent years. Tamau- 
lipas State area is covered by DEM at two resolutions, 50 x 50 m and 250 x 250 m [7]. 

In this paper, we propose a method to make spatial analysis based on geo-image 
processing by means of Spatial Analyzer Module (SAM). In Section 2 we present the 
description of SAM and describe its functionality. In the next sections, we describe 
how terrain ruggedness and drainage density have been obtained. Some results are 
shown in Section 5. Section 6 presents our conclusions. 



2 Spatial Analyzer Module 

SAM is a special module, which has been designed to make spatial analysis proce- 
dures. SAM uses vector and raster data to make the spatial analysis. This module has 
been implemented using Arc Macro Language (AML) to ensure portability between 
computer platforms executing Arcinfo 7.0 or later. 

The analysis is based on using different spatial data related to the case of study. 
SAM contains two components: Analysis Block and List of Procedures. 1) Analysis 
Block is composed of a set of processes to make data analysis. 2) List of Procedures 
stores the sequence of steps to execute the processes [8] (see Fig. 1). 




Fig. 1. Spatial Analyzer Module is composed of Analysis Block and Procedure List 



2.1 Analysis Block 

It contains the functions to make spatial analysis. These functions are the following: 
Interpolate Function. The method used is a minimum curvature spline in two di- 
mensions from a set of points. For computational purposes, the entire space of the 
output grid is divided into blocks or regions of equal size. They are represented in a 
rectangular shape. The equation 1 shows the spline function that has been used [9]: 
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S{x,y) = T{x,y) + Y^X.R{r.), 

,/=i 

where j= 1, 2... N; N is the numher of points; •. are the coefficients obtained from the 
system of equations, which computes the point coordinates; R(r^ is the distance from 
the point (x,y) to the j* point. 

To use this function, it is necessary to provide the set of points and tolerances, 
which depend on the specific case of study. 

Grid Functions. They contain the set of functions for cell analysis that include op- 
erations of the map algebra, and describe how the operations are specified, the data to 
operate on, and the order in which operations should be processed. In this case the 
function is SQRT. SQRT calculates the square root of the input grid [10]. 

Grid Generator. It is used to process some analyzed data, especially in density map 
generation. The vector grids are regular of m x m magnitude, in which m is the cell 
size. The cell magnitude in the grid is determined by the phenomenon under study 
characteristics (scale and covered area). Two alternatives can be used to generate the 
grids. First, specifying the initial and terminal grid coordinates ((x„, y„), (x,, y^)) re- 
spectively and establishing the number of required divisions for the grid. The second 
alternative is to specify the initial coordinate (x„, y„), cell size, number of columns and 
rows in the grid [11] (Fig. 2). 




Fig. 2. Specifications of the grid 



Overlay Functions. This module has been designed to make topological overlays, 
which can be used to identify areas of risk. A set of operations has been defined, and 
applied to the spatial analysis. This is made to establish the conditions and to combine 
different information layers using logical operators. These functions combine spatial 
and attribute data. The implemented operations for topological overlay in this applica- 
tion are: intersection, union and identity, which are represented by the symbols n, u 
and I respectively [10]. 



2.2 List of Procedures 

It stores the set of procedures for each one of the analysis processes. It has a descrip- 
tion of the required data type and the restrictions. However, the users can change the 
selection criteria. This provides a list of functions as an alternative for the analysis, in 
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which the parameters can be modified. SAM has a wide range of applications, not 
only to make geomorphometric analysis. It is also possible to perform the detection of 
landslide and flooding areas [11]. 



3 Generation of Topographic Ruggedness Layer 



The Terrain Ruggedness Index (TRI) is a measurement developed by Riley to repre- 
sent the amount of elevation difference between adjacent cells of a digital elevation 
grid [12]. The process essentially computes the difference in elevation values from a 
center cell and the eight cells surrounding it immediately. Then it squares each of the 
eight elevation difference values to make them all positive and averages the squares. 
The terrain ruggedness index is then derived by taking the square root of this average, 
and corresponds to average elevation change between any point on a grid and its sur- 
rounding area. The authors of the TRI propose the classification for the values ob- 
tained for the index (Table 1); 



Table 1. Terrain Ruggedness Index Classification 



TRI 


Interval (m) 


Represent 


1 


0-80 


Level terrain surface 


2 


81-116 


Nearly level surface 


3 


117-161 


Slightly rugged surface 


4 


162-239 


Intermediately rugged surface 


5 


240-497 


Moderately rugged 


6 


498-958 


Highly rugged 


7 


959-4367 


Extremely rugged surface 



The pseudo-code [12] to generate TRI layer is: 

program TRI 

{dem - Input Grid 

tmpl - Grid to store the Standard elevation 
difference 

tmp2 - Grid to calculate the Topographic 
Ruggedness Index 

tmp3 - Grid to verificate tri range 
outgrid - Output grid 
/* Standard elevation difference */ 

/*Execute cell by cell*/ 

tmpl (X, Y) : = ( (SQRT (dem (x, y) -dem (x-1 , y-1) ) + 

(SQRT (dem (x, y) -dem (x, y-1) ) + (SQRT (dem (x, y) - 
dem (x+1 , y-1) ) + (SQRT (dem (x, y) -dem (x+1 , y) ) + 
(SQRT (dem (x, y) -dem (x+1 , y+1 ) ) + SQRT (dem (x, y) - 
dem (x, y) ) + (SQRT (dem (x, y) -dem (x-1 , y+1) ) + 
(SQRT (dem (x, y) -dem(x-l,y) ) ) 

/* Evaluate cell-by-cell 
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tmp2(X,Y) := SQRT (tmpl (x,y) ) 

/* Evaluate cell-by-cell 

tmp3(X,Y) := If (tmp2 (x,y) >=5000) 
then tmp3(x,y) := 5000 

Else tmp3 (x, y) : =tmp2 (x, y) 

/* Evaluate cell-by-cell 

outgrid(X,Y) : = ( if (tmp3 (x, y) >=0 && tmp3 (x,y) <=80) 
then outgrid(x,y) :=1 
if (tmp3 (x, y) >=81 && tmp3 (x, y) <=116) 
then outgrid(x,y) :=2 
if (tmp3 (x, y) >=117 && tmp3 (x,y) <=161) 
then outgrid(x,y) :=3 
if (tmp3 (x, y) >=162 && tmp3 (x,y) <=239) 
then outgrid(x,y) : =4 
if (tmp3 (x, y) >= 240 && tmp3 (x,y) <=497) 
then outgrid(x,y) :=5 
if (tmp3 (x,y) >=498 && tmp3 (x,y) <=958) 
then cell (x,y) :=6 

if (tmp3 (x, y) >=959 && tmp3 (x,y) <=5000) 
then outgrid(x,y) :=7) f- 



4 Generation of Drainage Density 

Drainage density is defined as the total length of channels divided by area and meas- 
ured the degree to which a landscape is dissected by channels [13]. To generate the 
drainage density layer, it is necessary to build a regular grid of 1 km per cell [14]. 
Using this layer, we can construct the centroid layer. Later, the drainage layer is inter- 
sected with the grid layer. For each cell of the grid the lengths by area unit are added 
into centroid layer. The centroid layer is interpolated and the drainage density layer is 
obtained. Fig. 3 shows the process to generate the drainage density layer. 



Hidrologic 

Network 



Grid 

Layer 




Hidrologic ^ Grid 
Netv/ork layer 












Centroids 

Layer 



Frecuency by cell 

I I I B I B 



I 



Interpolate 

function 




Drainage Density 



Fig. 3. Process to generate the drainage density layer 
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5 Results 

Using SAM, we construct drainage density and terrain ruggedness layers. The method 
has been applied to the Tamaulipas State, Mexico. Some results are presented in this 
section. 

Fig. 4a shows the original DEM. The minimum value is 0 m, maximum value is 
3496, mean value of this layer is 227.40 m and the Standard Deviation is 498.469. Fig. 
4b shows the Terrain Ruggedness layer constructed by SAM, and the TRl classifica- 
tion of this area. The terrain index layer has the following values; mean is 2.386 m and 
the Standard Deviation is 2.457. This means that Tamaulipas State has slightly rugged 
areas in its territory. The extremely rugged areas are principally concentrated at the 
southwestern part of Tamaulipas State. DEM and TRI Layers are composed by 8000 
rows and 2478 columns. 




a) b) 

Fig. 4. a) Digital Elevation Model, b) Terrain Ruggedness layer. 

Fig. 5a shows the hydrological layer, this layer contains all streams of Tamaulipas 
State (1:200,000). The drainage density layer is showed in Fig. 5b. The mean value of 
this layer is 24857, which is nearly to the lower value. The concentrations are repre- 
sented in blue scale, the dark blue represents higher concentrations and light blue 
represents the lower concentrations. We can see the highest concentrations of drainage 
are situated in the south coast, near Tampico City. While the lowest density are pre- 
sented in the northwestern part of Tamaulipas State. 
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a) b) 

Fig. 5. a) Hydrological Layer, b) Drainage Density 



6 Conclusion 

In this work, a GIS-application (SAM) has been developed to analyze geomor- 
phometric characteristics of geo-images. SAM detects drainage density and terrain 
ruggedness using raster image data. In this method, spatial and attribute data are used 
to generate raster data. Using SAM, it is possible to define the semantic importance of 
the characteristics of the spatial data. Users can modify the criteria to have different 
scenarios to improve the decision making process. 

The geomorphometric analysis is traditionally performed using the methods based 
on topographic map-processing in manual way. Our approach significantly decreases 
the amount of time and effort required to quantify selected terrain characteristics. 
Other methods are designed to evaluate additional characteristics, which are different 
to the properties proposed in our approach. However, these methods can be integrated 
into SAM. 

The generation of drainage density and terrain ruggedness layers facilitates the ex- 
traction of spatial characteristics that can be used in other cartographic processes, for 
instance in the generalization. 
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Abstract. The process of taking decisions in an autonomous system has 
a strong dependence on the information received from the environment 
in which it works. The path planning process for a mobile robot is a 
subset of the problems that have to be solved in an automatic system 
of decisions, where the appropriateness of the method is roughly cons- 
trained not only by the achievement of a well-suited information from 
the environment, but also by how this information is acquired. The main 
objective of this work is to describe a robust model for robotic path pla- 
nning in unknown environments. The proposed general model is based 
on obtaining secure paths where the robot will be able to move so as to 
complete its task. Thus, from an unknown location, the required informa- 
tion is collected by employing a mobile robot that moves freely through 
a real world with some generic behaviors that let the robot explore the 
world. 



1 Introduction 

The growing incorporation of robots to our environment [1] shows the emergent 
proposal of systems that have an intelligent behavior and, as a result, makes it 
possible to have a great amount of technical solutions that improve the execution 
of specialized tasks. 

From an Artificial Intelligence point of view, the robotic systems have also 
an enormous interest when the research is focused on the associated processes 
that intelligence involves, where intelligence refers to an essential feature in 
some complex systems, such as those ones who involve learning and autonomous 
control [2]. This way, an autonomous robot can be considered as an embedded 
system and, in general, as an entity with some reactive and deliberative abilities 
with adaptive behavior; consequently, it is related to dynamic environments. 

The multidisciplinary research in Robotics includes many different approa- 
ches in the methods for designing a robot. First, we have the classical point of 
view that utilizes a three-level architecture: Functional Level, Executive Level 
and Planning Level, as it can be seen in [3]. On the other hand, a more flexible 



A. Sanfeliu and J. Ruiz-Shulcloper (Eds.): CIARP 2003, LNCS 2905, pp. 651—658, 2003. 
(c) Springer- Verlag Berlin Heidelberg 2003 



652 



J. Azorin et al. 



characterization is based on some analogies with biological evolution [4]. As a 
consequence, the first approach implies that these architectures must be simpli- 
fied quite often since many requests have to be completed at the same time. On 
the contrary, the second contribution provides a specification based on behaviors 
that are modelled by means of complex control algorithms that cover a greater 
range of requests. 

Nevertheless, most of the projects proposed for robotic prototypes include 
architectures based on general criteria of designing in order to solve real-world 
problems, and also incorporate special features for its adaptation to special en- 
vironments [5], [6]. 

In this work, we propose a general model for the recognition of the world 
in which a robot interacts, in order to follow a collision-free path within the 
environment. First, we need to define the fundamental concepts for the charac- 
terization of the environment and the acquisition of information, as it is described 
in Section 2. After that, a robust model to generate a map, considering some 
topological landmarks, is introduced in Section 3. Then, Section 4 explains how 
the received information is processed and, subsequently, in Section 5 the ex- 
perimentation is widely illustrated to verify the accomplishment of our research 
objectives. Finally, we conclude with some important remarks in Section 6. 



2 Design of a Robust Model 

Let us consider a general approach to the situation shown in the previous section. 
First of all, the problem can be described as the obtaining of safe paths where a 
mobile robot would be able to wander with no collision, in order to fulfill some 
objectives. In relation to this, from an unknown world, the robot will obtain 
information that, once processed, will lead to obstacle-free paths. 

To do this, we need to establish theoretically the representation of the fac- 
tors that would interact with the robot. Since our purpose is obtaining an inter- 
pretation function that approximates the environment as good as possible, the 
suitability of the solution will be mainly determined by the quality of that repre- 
sentation. That is, we want to define a function that synthesizes an image from 
the perception of the environment; as it is described in the following sections, 
the synthesization uses Mathematical Morphology methods. 

Thus, the algorithm that builds maps of the environment can be divided into 
a series of independent steps, where each step of lower level provides the nece- 
ssary information for the upper level processes. This way, it can be considered 
at the same time as an abstraction of the immediately lower levels. 

As a result, this model gives a multi-level architecture that can be divided 
into the following layers: 

~ Acquisition of information 

• Positional information 

• Topological information 

• Constraints of the environment 
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~ Information processing 

— Generation of knowledge 

Next, a more detailed description of these levels will be found. 

3 Acquisition of Information 

As oire would expect, the data acquisitioir depends both on the enviroirmeirt aird 
on a set of features that define the operation of the mobile robot. There are two 
kind of features: 

— The internal sensing capabilities, which are related to obtain the robot lo- 
cation in a real world. 

— The external sensing capabilities to acquire information by interacting with 
the enviroirmeirt (laser, sonar, cameras,. . . ). 

In the following sections these features are explained. 

3.1 Positional and Topological Information 

Considering the internal capabilities, the positional information will be deter- 
mined mainly by the actual position of the robot in a workspace. Of course, the 
robot’s self-knowledge about what is its location also has a great effect on the 
final result. 

On the other hand, the external capabilities can be calculated by acquiring 
topological information; these data are extracted from the environment where 
the robot is interacting to. Thus, the topological landmarks are related to the 
different zones in which an environment can be divided. Therefore, each of these 
zoires must be located iir the world maps aird, as a result, the lairdmarks will 
establish a possible positioning state of the robot. Let be all the possible 
states. 

To do this, in our design we will consider that the robot includes a soirar 
sensor ring, as reading these seirsors will give the system eirough kirowledge to 
distinguish the topological zones in which a map is divided. Then, the likeness 
function should be defined; starting from a group of sonar readings, this function 
provides the probability that the sonar measurements have been created from 
each one of the states e^. The likeness function is determined by: 



where a is a correction factor, such that a > 1, and dist is the euclidean distance 
between the current sensor reading r = (ri, T 2 , . . . , r„) and the representation of 
each state in the database: 



v{r\ci) = exp{—dist{r,ei)/a) . 



( 1 ) 



n 




dist{r, 6i) 



( 2 ) 
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It is also necessary to establish a model for the robot’s movement, which 
indicates the probability of either changing from a state to another one or keeping 
on the same state: 

m{ei\ej) . (3) 

This model has been implemented by using an FSM (Finite State Machine), 
so that the probability of a transition between states must be estimated according 
to the environment. 

Finally, the acquisition function should be tested. This point can be com- 
pleted as the robot follows a path, while it avoids obstacles with a constant 
velocity. Then, we dehne a probability function Ct{ej\r) in order to keep the 
robot on the same state or to change it, if necessary, when the movement is 
produced: 

n 

ct{ej\r) = av{r\ej)'^m{ej\ei)ct-i{e,\r) . (4) 

i=l 

where a is a real constant. 



3.2 Topological Landmarks 

From the former discussion, it is clear that each landmark will identify a locating 
state (ei, C 2 , . . . , e„). Let us consider the environment dehned in Fig. 1. In this 
case, the possible landmarks are: 

— Corridor: el 
~ Corner: e2 

— T: e3 

— Crossing: e4 



e2 














e4 


e3 












e1 



Fig. 1. Proposal of environment 



Therefore, in this workplace we can hnd the four landmarks depicted in Fig. 2. 
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Fig. 2. The four topological landmarks 



The experimentation is arranged by means of a robot that includes 24 sonar 
sensors. Then, each reading from the sensors is computed and, afterwards, the 
robot training is carried out for each landmark that has been previously iden- 
tified in the map. To perform this, the robot is placed at each landmark and 
all the possible readings are obtained. From that information, an image of the 
environment can be created and, as a result, a map is generated; this is explained 
in the next section. 



4 Information Processing and Generation of Knowledge 

The next level in the architecture performs a data processing after that the 
information is received at the lower level. As we have pointed out before, the 
collected data come from a group of sonar sensors which provide a set of points 
(i.e., a reading) that makes a representation of the environment. 

In order to achieve a reliable image starting from this set of readings, we 
propose to use a morphological dilation of the obtained points to extract an 
estimated map (see Fig. 3). Then, the dilation is carried out considering both 
the robot architecture as well as its physical dimensions. The objective is to find 
a set of points that provides free paths for a no-collision navigation and, as a 
result, a synthesized image that contains a map of the environment. 




Fig. 3. Dilation of a sensor reading 



It would be very important to take into account the obtained positional and 
topological information so that the dilation could create separate regions for the 
different landmarks and, consequently, could provide a better data representation 
for the path planning process (Fig. 4). 
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Fig. 4. Dilation of points considering topological information 



Finally, the highest level includes the generation of an image from the set of 
regions that are labelled according to the landmarks that exist throughout the 
environment. This process can also be completed by means of the dilation of the 
sensor reading points. In the following section we describe some examples of the 
application of these operations. 

5 Experiments 

Let us consider now the results of some experiments completed for our model. 
Thus, the tests have been simulated and, afterwards, the robot is supposed to 
work in the environment built in Fig. 1. As shown before, this world consists of 
a set of points labelled as corridor, corner, T and crossing points (see Fig. 5 (a)). 

Let us assume that the robot moves along a path established by a generic pat- 
tern of movement. Then, the positional and topological information is extracted 
while acquiring points in a planned trajectory, as shown in Fig. 5 (b). 
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(a) (b) 



Fig. 5. ( a) Landmark labelling for an environment, (b) Sample path for the considered 
environment 



This way, when the mobile platform goes along a path, the acquisition of 
information level provides a series of points where the robot is able to move (see 
Fig. 6). 

As the robot receives the data, the information processing system synthesizes 
the map of the environment in real time, considering every set of points received. 
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Fig. 6. Results of the acquisition level 



Consequently, once the system has labelled the points according to one of the 
possible landmarks, the morphological dilation provides the regions where an 
object collision has a low probability (Fig. 7). 




Fig. 7. Results of the information processing level 



The last images show that the synthesis is completed with an iterative 
method, where all the information that the sensors provide in real time allows 
that the map of the environment has a good quality without consuming many 
resources. As we can see, the main goals of our research task have been accom- 
plished. 

6 Conclusions 

In general terms, the path planning process for a mobile robot is strongly in- 
fluenced by the precision of the acquisition process. Thus, it can be modified 
both by the quality of the information obtained from the environment, and the 
attributes of the system and the environment in which it works. 

In this paper, we have developed a proposal of a model for the generation of a 
map in unknown environments; then we have designed an information acquisition 
process that allows a representation of the environment for the path planning 
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process. The experimentation shows that the prototype is quite robust and could 
be applied in some indoor environments. 

As a future work, it would be so desirable the application of the model in 
a real environment, as well as consider new simulation experiments with di- 
fferent environments. This will lead to a more accurate designing method so 
that the robot internal hardware could be efficiently implemented. As a result, 
it can support all the possible requests that the model would have in a real 
situation and, finally, the function of acquisition can be generalized for other 
sensing models. Therefore, we can achieve more robustness in the positional and 
topological acquisition of information and, consequently, in the higher levels of 
the architecture. 
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Abstract. In this paper we propose a method for transforming a 3D 
map of the environment, composed by a cloud of millions of points, into a 
compact representation in terms of basic geometric primitives, 3D planes 
in this case. These planes, with their texture, yield a very useful repre- 
sentation in robot navigation tasks like localization and motion control. 
Our method estimates the main planes in the environment (walls, floor 
and ceiling) using point classification, based on the orientation of their 
normal and its relative position. Once we have inferred the 3D planes we 
map their textures using the appearance information of the observations, 
obtaining a realistic model of the scene. 



1 Introduction 

Perception is a critical element in robot navigation tasks like map building (map- 
ping) and self-localization. The quality of the map and its post-processing are 
key for successfully performing these tasks. Early mapping solutions were based 
on 2D information extracted with sonars [1]. In these cases, the environment is 
modeled with an occupation grid [2] . In [3] [4] the 3D grids extracted from point 
clouds are inferred with stereo vision. As these clouds have typically millions 
of points it is impractical to manage them both in terms of data storage and 
efficiency. Moreover, it is desirable to obtain representations of higher level of 
abstractions. Thus, following the idea of “from pixels to geometric primitives” 
the approach intoduced in [5] applies the Hough transform to find the vertical 
planes (walls) of the environment from stereo data. However, in this case a high- 
resolution partitioning of the parametric space, which feeds the voting process, 
is required to find that a good approximation. 

Planes have also been estimated using 3D range sensors like laser scans, which 
produce a more dense information . In [6] it is proposed an adaptation of the 
EM algorithm [7] for detecting planar patches in an indoor environment. The 
approach proposed in [8] combines range information and appearance to recover 
planar representations of outdoor scenes (buildings). However, these two latter 
approaches require very dense sensors. Here we focus on the case of having a 
stereo sensor, typically producing very noisy sparse information which is highly 
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concentrated on high-textured areas. Iir this paper we obtaiir the main plaires of 
the scene (walls, floor and ceiling) assuming that the robot is moving in a plane- 
parallel environment. In order to do so, we first group the 3D points in the map 
(see [9] and [10] for a complete description of our map-building process) using 
the direction of their normals. Then, we fit a 3D plane to each group, and finally 
we perform texture mapping using the information of the initial observations 
coming from many points of view. 



2 Sensor and Robot Models 

In this paper we use the Digiclops trinoclular stereo system mounted on a Pioneer 
mobile robot coirtrolled with the Saphira library. Giveir these elemeirts we defiire 
an observation at time t, that is Vt as the set of 3D observed poiirts {pij , riij , Cij) 
collected iir matrix [vij], where pij are the coordinates of a given point, a 
normal vector which has to be estimated, and Cij is the grey level or color of the 
point. 

Assumiirg that the robot moves over a plane and that the focal axis of the 
camera is always parallel to this plane, the state or pose of the robot at time 
t is given by the robot’s coordiirates at plane XZ and its relative angle with 
respect the Y axis, that is pt = Similarly, an action performed by 

the robot at time t is defined in terms of the increment of the current pose at = 
{Axt, Ayt, Aat), and a trajectory performed by the robot is the sequence of t 
observatioirs = {v\,V 2 , . ■ . , ft} and t associated actions A* = {oi, 02 , . . . , at}. 

Actions cair be robustly estimated from observations, and by iirtegratiirg the 
trajectory performed by the robot through robust matchiirg aird alignment we 
obtain a map consisting of a cloud of millions of 3D points (see [9] aird [10] for 
more details). 

3 Estimating Points Normals 

Here we focus on estimatiirg the surface normal ntj for each poiirt ptj at a 
given observation vt- Iir order to do that we consider the 4 or 8 neighbors of 
the points in the observation matrix [vtj], that is, we are exploiting the 2D 
layout of the points (see Figure 1). In order to improve robustness, instead of 
consider each neighboring point, we consider neighboring regions of size 1. For 
each region Ri we take its centroid r^. Then, given the considered point ptj and 
the centroids {ri, C 2 , . . . , Cn} of the n neighboring regions we build the vectors 
at = Ti —Pij. Then, the normal results from multiplying adjacent vectors in 
counterclockwise sense and taking the average: 

riij = ((«„ X a„_i) -I- (on-i X a„_ 2 ) -I- . . . -I- (oi x an))ln (1) 

As the quality of the latter estimation depends on the number of valid 3D 
points inside a given region, we consider that the resulting normal is undefined 
when there is not enough information to provide a robust estimate. 
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Fig. 1. Estimating the normal at a point using 8 regions: Neighboring regions of size 
5x5 (left), centroid for each region and associated vector (center), normal vector 
resulting for applying Expression 1. (right) 



4 Vertical Planes Estimation 

4.1 Removing Horizontal Planes 

Assuming that the floor is flat and also that the height of the camera is constant, 
and considering the fact that the floor and ceiling planes are usually low textured 
and in this case their associated stereo points are typically very noisy, we remove 
these latter planes and we focus on the vertical ones (walls) (see Figure 2). 




Fig. 2. Removing the floor and the ceiling. Complete scene with those planes (left). 
Resulting scene after removing the planes (right). 



Once we have only vertical planes, the problem of estimating these planes can 
be posed in terms of finding in 2D the segments resulting from their projections 
on the imaginary horizontal plane. In order to do that, we will build a Gaussian 
Mixture Model classifier for 2D normals where each class is given by the set of 
points with similar normals (associated to parallel walls). Next, we build the 
planes associated to each class with a connected-components process. 

4.2 Gaussian Mixture Model Classifier 

Our one-dimensional mixtures-of-Gaussians classifier [11] is built on a set of 
n samples X = {xi,X 2 , ■ ■ ■ ,Xn} that we want to fit with k Gaussian kernels 
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with unknown parameters {(/ri, cti), (/j, 2 , 172)5 (Mfei We must estimate 

the parameters that maximize the log-likelihood function: 

n k n k 

£ = log]^^7rjP(a;i|j) = ^ log ^ 7TjP(a;i|j) (2) 

2=1 ^ = 1 2=1 j — ^ 

where tTj is the prior probability of belonging to the kernel j and P(xi\j) is 
the probability for Xi of a Gaussian centered on the kernel j. In order to find 
the prior probabilities and the parameters of the kernels we apply the standard 
EM (Expectation-Maximization) algorithm [7]. 

In the E-step (Expectation) we update the posterior P{j\xi), that is, the 
probability that a pattern Xi is generated by kernel j : 

k 

P{j\Xi) = TTjP{Xj\j)/'^TTlP{Xi\l) (3) 

1=1 

In the M-step (Maximization) we proceed to update the priors and the pa- 
rameters of the kernels given the posteriors computed in the E-step: 

Alternating E and M steps the algorithm converges to the closest local max- 
ima with respect to the initialization point. Then, we take the MAP estimate for 
each normal: MAP{xi) = argmaxj P{j\xi), where P{j\xi) = njP{xi\j)/P{xi). 

4.3 Classifying Normals 

In order to classify each normal, we take the relative angle between the reference 
vector (1,0,0) in the XZ plane. In Figure 3 we represent an example of clasifi- 
cation with real data. We represent the original point cloud with the normal of 
each point, and the directional histogram with four peaks associated to the four 
types of parallel planes. Also, we show how are classified the points of the scene. 
We have used fc = 4 kernels whose averages have been randomly initialized from 
the interval [0,360]. 

When the number of kernels is under 4 (for instance we may have only two 
classes when the robot is in the middle of a corridor), the algorithm also converges 
because in this case the prior probabilities of the non-existent classes tend to 
zero. We illustrate this case in Figure 4. Finally, we also consider the pre-filtering 
of noisy patterns (normals, in this case) in order to avoid distortions in the final 
result . 

4.4 Fitting Vertical Planes 

Once we have found the k clases C = {ci, C 2 , . . . , Ck} associated to the types of 
wall appearing in the scene, each Cj contains a set of points {p^} with similar 
normals. Next, we proceed to divide these sets in different vertical planes. 
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Fig. 3. Classification example using EM algorithm. 2D point clond for a given scene 
and their normals (left). Directional histogram and final kernels (center). Final Kernels 
distributions (right). 




Fig. 4. Initial classification of the scene in Figure 2. Normals (left) and final position 
of the 4 kernels. The prior probabilities are 0.0, 0.42, 0.0 and 0.58 respectively. 



Given two points and pi of class Cj we consider that they belong to the 
same plane when the distance between them in the XZ plane is below a given 
threshold A: WpI — pI\\xz < A. Given this binary relation we build a graph 
Gj{V,A) whose vertices are associated to points in the class and the edges are 
associated to pairs of vertices that satisfy the previous binary relation. Then we 
calculate the connected components of this graph which represent the vertical 
planes. 

Once we have computed the connected components we must estimate the 
parameters of the vertical planes and their bounds. We consider the set of points 
with their normals {{pi,ni), (p 2 ,n 2 ), ■ ■ ■ , (pi,ni)} that define a given plane 'ip. 
We take as point and normal of the plane the centroid and the average normal, 
respectively: tp = (j>,ri). The plane’s bounds are obtained by computing the 
orthogonal plane ip-^ = {p,n^)., and these bounds are determined by the most 
distant points from this orthogonal plane (see Figure 5). 

Finally, to consider a plane valid, is necessary to verify that it is sufficiently 
long and it contains enough points. 

Once we have computed the vertical planes in 2D, we must to apply their 
height in 3D, using the floor and ceiling heights (see Figure 6). In the other 
hand, the vertical planes bounds (floor and ceiling) are calculated using the 
bounding-box of the vertical planes set. 
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Fig. 5. Determining the parameters (point, normal and bounds) of each plane. 




Fig. 6. Planes detected from the example of the hgure 3. Vertical planes in 2D (left) 
and corresponding 3D planes with horizontal planes also. The algorithm detects 8 
vertical planes. 



5 Plane Texturization 

Once we have found the horizontal and vertical planes, we must texturize them 
using the appearance information of the observations (reference images) . 

Each plane defines a rectangular region of the space which can be paramet- 
rically crossed in two directions, horizontal (a) and vertical (/ 3 ). Each 3 D point 
Pa/3 of the plane, can be observed by anyone of the t observations {v\,V2 , . • . , Vt}, 
which we know its respective poses {ipi, tp2, ■ ■ ■ , ft}- 

Using the fundamental matrix of the camera, we project the point on each 
image. Then, we consult the pixel color in each image, obtaining a set of color 
candidates for this point {ci, C2, . . . , Ct}- We must reject the points of this set 
that are not visible (because there is a vertical plane between the projection and 
the 3 D point). The final color of the point is calculated like closest to the average 
of the set: argmiuc^ |cj — c|. 

The method is able to obtain a quite realistic texture. Nevertheless, an inher- 
ent problem resides in the objects in the scene that do not adjust to any plane. 
We have observed a smooth effect in this objects when they are captured from 
different points of view (see experiments section) 
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6 Experiments and Validation 

In this section, we present a complete experiment in which we have estimated 
the planes in a real scene. The scene is composed by 68 observations, along an 
indoor environment located in the facilities of our department. The original point 
cloud present 2.025.666 points in 3D (Figure 7 left). 




Fig. 7. Original point cloud of the experiment (left). Points and normals after hori- 
zontal planes removal (top right). Vertical planes detected (Bottom right). 



Using the proposed algorithm, we obtain 11 vertical planes (Figure 7 Bottom 
right). Its corresponding textures, as well as those of ceiling and ground, have 
been stored in different images. The complete scene (geometric information of 
the planes and its corresponding textures) occupies 980 Kb of disk space. Com- 
paratively, the original point cloud mentioned previously occupies 31.652 Kb of 
disk space. 

In Figure 8 we show several 3D views of the textured model. 




Fig. 8. Several 3D views of the final 3D scene. 
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7 Conclusions and Future Work 

In this work, we present an algorithm to estimate the principal 3D planes of a 
scene composed by a set of stereo observations. We have proposed a two-step 
point classifier based on the normal and the relative positions of the points. 
Finally, we present an algorithm to estimate the texture of each plane using 
appearance information of the observations. 

We are currently investigating in the construction of non-plane primitives 
from these planes. Our initial idea is, after estimating planes, we model each 
one with a free approximation surface. Using this technique, we can obtain more 
realistic scenes. On the other hand, we are interested in detecting other basic 
primitives like cylinders and spheres. 



References 

1. S. Thrun et al. “Probabilistic Algorithms and the interactive museum tour-guide 
robot Minerva”. International Journal of Robotics Research Vol 19 No. 11. Novem- 
ber 2000. 

2. F. Dieter, W. Burgard, S. Thrun “The dynamic window approach to collision 
avoidance”. IEEE Robotics and Automation Magazine, 1997 

3. H.P. Moravec. “Robot spatial perception by stereoscopic vision and 3D evidence 
grids”. TR The Robotics Institute Carnegie Mellon University. Pittsburgh, Penn- 
sylvania, 1996. 

4. S. Se, D. Lowe, J. Little. “Vision-based mobile robot localization and mapping using 
scale-invariant features” Proc. of IEEE International Conference on Robotics and 
Automation. Seoul, Korea May 2001. 

5. L. locchi, K. Konolige, M. Bajracharya. “Visually realistic mapping of planar envi- 
ronment with stereo” . Seventh International Symposium on Experimental Robotics 
(ISER’2000). Hawaii 2000. 

6. Y. Liu, R. Emery, D. Chakrabarti, W. Burgard, S. Thrun. “Using EM to learn 
3D models of indoor enviroments with mobile robots”. Eighteenth International 
Conference on Machine Learning. Williams College, June 2001. 

7. A. Dempster, A. Laird, D. Rubin. “Maximum likelihood from incomplete data via 
the EM algorithm”. Journal of the Royal Statistical Society, Series B 39, 1 38. 1977 

8. I. Stamos, M. Leordeanu. “Automated Feature-Based Range Registration of Urban 
Scenes of Large Scale”. IEEE International Conference of Computer Vision and 
Pattern Recognition, pp. 555-561, Vol. II, Madison, WI, 2003. 

9. J.M. Saez, F. Escolano “Monte Carlo Localization in 3D Maps Using Stereo Vi- 
sion”. In: Garijo, F.J., Riquelme, J.C., Toro, M.(eds.): Advances in Artificial Intel- 
ligence - Iberamia 2002. Lecture Notes in Computer Science, Vol. 2527. Springer- 
Verlag, Berlin Heidelberg New York (2002) 913-922. 

10. J.M. Saez, A. Penalver, F. Escolano “Estimacion de las acciones de un robot uti- 
lizando vision estereo”. IV Workshop de Agentes FIsicos (WAF 2003). Alicante, 
April 2003. 

11. R.A. Redner, H.F. Walker. “Mixture Densities, Maximum Likelihood, and the EM 
Algorithm”. SIAM Review, 26(2): 195-239, 1984. 



An Oscillatory Neural Network for Image Segmentation 



Denis Fernandes' and Philippe Olivier Alexandre Navaux^ 

'PUCRS - Faculdade de Engenharia - Pontiffcia Universidade Catolica do Rio Grande do Sul, 
Av. Ipiranga, 6681, 90619-900 Porto Alegre, RS, Brazil 
denisOee . pucrs . br 

"UFRGS - Instituto de Informatica - Universidade Federal do Rio Grande do Sul, Av. Bento 
Gonsalves, 9500, 91501-970 Porto Alegre, RS, Brazil 
navauxoinf . uf rgs . br 



Abstract. Oscillatory neural networks are a recent approach for applications in 
image segmentation. Two positive aspects of such networks are its massively 
parallel topology and the capacity to separate the segments in time. On the other 
hand, limitations that restrict the practical application are found in the proposed 
oscillatory networks, such as the use of differential equations, implying high 
complexity for implementation in digital hardware, and limited capacity of 
segmentation. In the present paper, an oscillatory neural network suitable for 
image segmentation in digital vision chips is presented. This network offers 
several advantages, including unlimited capacity of segmentation. Preliminary 
results confirm the successful operation of the proposal in image segmentation 
and its good potential for real time video segmentation. 



1 Introduction 

The increasing demand for artificial vision systems, which implement complex algo- 
rithms with high speed, justifies the development of vision chips [10] or silicon reti- 
nas [3]. In these chips, the photo detectors corresponding to the pixels of the image 
are jointly integrated with a massively parallel network of processing elements (PEs) 
for execution of specific operations over the input image. Analog implementation of 
vision chips requires simpler circuits than the digital one, presenting, on the other 
hand, lower flexibility to reprogram the executed function [10]. So, when flexibility is 
required, the digital implementation is more attractive. 

Recently, alternative topologies of artificial neural networks, the oscillatory neural 
networks, which are inspired on the mechanism of segmentation executed by the 
human brain, have been applied in image segmentation with satisfactory results 
[4][6][1 1][14]. The study of these networks is a fertile field of work, as well as the 
development of dedicated architectures for efficient implementation [2] [6]. 

The LEGION network (Locally Excitatory Globally Inhibitory Oscillator Network) 
[15] is the most consistent proposal of oscillatory neural network for image segmen- 
tation found in the bibliography. Its applications include segmentation of remote 
sensing images [12], medical images [8][13], and electronic microscope images [7]. 
An interesting aspect of this network is the capability to separate the segments of the 
image in time, facilitating later identification and quantification. Its massively parallel 
nature is adequate to the implementation of vision chips for image segmentation. The 
negative aspects include high computational complexity for implementation in digital 
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hardware as a consequence of the use of differential equations, limited segmentation 
capacity, and also the high amount of parameters and their little intuitive setting. 

This paper introduces a new model of oscillatory neural network inspired on 
LEGION network, suitable to applications of image segmentation and implementation 
in vision chips with digital technology. The network presents massively parallel to- 
pology and it is able to separate the segments in time. The low complexity and the 
non-limitation regarding the number of segments to be discriminated are advantages 
of the network. The use of few parameters with intuitive setting is also a positive 
aspect, as well as the reduced number of iterations and the easy predictability of the 
time to reach the results. The digital structure also presents flexibility for easy imple- 
mentation of more sophisticated segmentation procedures, using different attributes of 
the image. Results found in practical implementations are presented, proving that the 
operation of the proposed network was performed according to the expectations, and 
showing its potential efficiency to real time video segmentation. 



2 The Proposed Oscillatory Neural Network 

In the late 80’ s, oscillations of approximately 40 Hz were discovered in the visual 
cortex of the human brain. Such oscillations have strong correlation with the visual 
stimulus and synchronism of phase occurs between physically near neurons that re- 
ceive similar stimulus, which can characterize a homogeneous region of the perceived 
image. Physically near neurons that receive different stimulus or physically distant 
neurons do not present such synchronism of phase [4] [16]. 

A new oscillatory neural network was conceived [9] using the property of local 
synchronism between neurons and adding a mechanism of global inhibition imple- 
mented with local connections to get anti-synchronism among different groups of 
neurons. The proposed network is suitable to application in image segmentation and 
implementation in digital hardware with massively parallel topology. 



2.1 Structures of Counections 

The proposed network is implemented in a two-dimensional topology with the same 
size of the image to be segmented. Two structures of connections among neurons, 
called excitatory connections and inhibitory connections, are used. 

Figure 1(a) presents an example of the excitatory connection structure. A neuron 
has its excitatory output simultaneously connected to the 8 nearest neurons. So, the 
excitatory output of a neuron will be activated when at least one of its nearest neigh- 
bors, with similar input, is active. 

Figure \{b) presents an example of the inhibitory connections structure adopted. 
Each neuron has its inhibitory output connected to only one neighboring neuron. 
This structure establishes a priority order and neurons with higher priority inhibit the 
remainders. The excitatory connections will cause the only neuron qualified by the 
inhibitory connections to excite the other neurons belonging to same segment. The 
inhibitory output of the neuron with the lowest priority will be active only when at 
least one neuron is active. In a practical implementation, this signal can be used to 
detect images without any segment present at the output of the network. 
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Fig. 1. Examples of structures of excitatory connections (a) and inhibitory connections (b) 



2.2 The Network Neuron 

The basic idea for the conception of the proposed neuron consists of associating a 
binary counter to each of them in a way that neighboring neurons with similar inputs 
are synchronized in the same state, different from the states of the other groups (Fig- 
ure 2). The neurons stay inactive, until its counters reach a predefined state. 




Fig. 2. Basic idea behind the conception of the proposed neuron 



Figure 3 presents the internal logic structure of the proposed neuron. The defini- 
tions of the constants and variables are listed below: 

• total number of neurons in the network; 

• N;. maximum number of segments allowed to be separated by the network; 

• L^\ threshold for weight determination; 

• state of the internal binary counter belonging to the neuron placed at line i 
and column j in time (iteration) f; 

• output signal of the neuron internal comparator; 

• vj(i,j,t): excitatory output of the neuron; 
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• vji,j,t): inhibitory output of the neuron; 

• v„{i,j,t)\ inhibitory input of the neuron; 

• v,(i,j,t): leader indication signal; 

• input signal of the neuron representing a set of features of the related pixel; 

• w{i,j,k,l,t): weight related to the comparison between the input of the neuron placed 
at line i and column j and the input of the neuron placed at line k and column 1. 




VioQJ) 






Fig. 3. Neuron internal stmcture of the proposed network 

A binary counter with states defines the neuron state as a function of the time. 
The counter receives a synchronous reset if the neuron is active (v/ij,t)=l), otherwise 
its state is increased until NA, remaining there until it is reset. The use of states 
makes the discrimination of a maximum of segments possible, even though, in this 
case, there is no similarity between any inputs of neighboring neurons. The proposed 
neuron structure qualifies the network to separate the first segments according to 
the sequence established by the structure of inhibitory connections. 

The leader signal can be generated internally or externally and only neurons with 
vfi,j,t)=\ will be initially qualified to pass to the active phase (leaders). Neurons with 
0=0 can be activated through an active neighbor with similar input. In the exter- 
nal generation, some criteria such as the position in the image (for example) can be 
established. It could be established that neurons in the central region of the image 
would be the only ones qualified to pass to the active phase, causing the appearance 
of only that segment at the output of the network. In the internal generation of the 
signal, one can use, for example, the criteria that a leader must have all the excitatory 
weights active, which correspond to a pixel in the center of a homogeneous region. 

The weights of the network are determined through the comparison of each neuron 
input attribute intensities with the respective inputs of its neighboring neurons (1). In 
the cases where the differences between such inputs are, in module, below a threshold 
L^, the respective weights will be unitary. Different attributes of the image can be 
used to carry through the segmentation. On the other hand, can represent vec- 

tors of attributes related to the pixels. In this case, the weights can be determined 
using a measure of vectorial distance, implying more complex structures. In color 
images, the Euclidean distance could be used to determine the weights, which would 
result in segmentation by color similarity. 
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jo if \l^{i,j,t)-I^{k,l,t)\>L„ 
|l if \l^{i,j,t)-I^{k,l,t)\<L„. 



( 1 ) 



3 Implementation and Results 

To verify the functioning of the proposed network, two types of implementation were 
carried out. The first one is related to the simulation of the behavior of the network 
through an algorithm implemented in a PC computer. The second one is the use of the 
Max+plusII program from Altera [1] for simulation of the network and verification of 
the viability of its implementation in digital devices [9]. 



3.1 Segmentation of an Artificial Image 

Figure 4 presents an example carried out through the algorithm that simulates the 
proposed network. The input image, with 100x310 pixels, is placed at the top left 
position. The additional 7 images are the non-null segments got from the output of a 
network with 100x310 neurons. In these 7 images, each pixel is a neuron output and 
the white represents the active neurons. The weights were calculated on the basis of 
the neighboring pixel intensity differences with T„=0. The neurons with all the exci- 
tatory weights unitary were considered leaders. The first segment is the background, 
having the interior parts of the characters “P” and “A” not presented, as they are not 
physically connected. All the characters were correctly isolated in time, facilitating 
the application of a character recognition procedure. 




Fig. 4. Segmentation using the algorithm that simulates the proposed network operation 



3.2 A Practical Application 

Figure 5 presents a gray level image with 1482x2060 pixels and 8 bits, which was 
obtained through transmission electron microscopy (TEM). This image represents a 
silicon sample where ions of helium were implanted with the objective to reduce 
defects in the crystalline structure. The helium accumulates itself in bubbles concen- 
trated in some areas of the silicon. For evaluation of the process, it is necessary to 
determine the gas volume in the sample. Such volume can be found through the 
counting of bubbles and estimation of their areas. This procedure must be carried 
through for several images, being a complex task for human manual execution [7]. 
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Fig. 5. Image of TEM representing a silicon sample with helium bubbles 

The proposed network was used to segment the helium bubbles and simultaneously 
separate them in time, facilitating the implementation of an automatic process of 
counting and area measuring. Figure 6(a) presents a region extracted from Figure 5, 
which is used to verify the qualitative results of the proposed network. 






(a) 

Fig. 6. Region extracted from Figure 5 (a) and filtered with the FPS algorithm (b) 

As other segmentation procedures, the proposed network is sensitive to noise in the 
input image. Such problem makes the use of a smoothing filter necessary. Figure 6(b) 
presents the result of filtering the Figure 6(a) image using the FPS (Feature Preserv- 
ing Smoothing) algorithm [5]. The noise reduction and the preservation of the con- 
tours of the helium bubbles can be easily observed. 

Figure 7 presents the non-null segments got from the output of a network with 
1482x2060 neurons. Besides the 9 bubbles, a segment representing the silicon back- 
ground was also supplied (bottom right position), which can be easily detected and 
discarded in the automatic procedure of measuring. The network weights were calcu- 
lated on the basis of the neighboring pixel intensity differences with T„=3. 

The bubble areas can be obtained computing the ratio between the number of ac- 
tive neurons and the total number of neurons. Knowing the relation in pixel/nm (Fig- 
ure 5), the helium volume can be estimated. Visual area determination can be inexact 
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Fig. 7. Segments of Figure 6{b) image obtained with the proposed neural network 

if the bubbles are not perfectly circular. Such limitation is not observed in the pro- 
posed method. Incomplete, degraded and superimposed bubbles can lead to wrong 
measures. For these situations, the manual process can be used. 



4 Conclusion 

The use of oscillator networks to simulate the capacity of image segmentation of the 
human brain is a recent proposal with good results. In this context, the LEGION net- 
work is the most consolidated model. Its massively parallel topology and the capacity 
to separate the segments in time are highly attractive. On the other hand, the structure 
based on differential equations has high complexity for digital machines. Another 
disadvantage is the limitation to segment a high number of objects simultaneously. 
The high number of parameters and their little intuitive setting are also drawbacks. 

The oscillatory neural network presented by this paper has massively parallel to- 
pology and capacity to separate the homogeneous regions of the input image in time. 
The network does not present limitation related to the number of segments, has lower 
complexity, is suitable for implementation in digital vision chips, and uses few pa- 
rameters, with easy setting. Other positive aspects are the easiness of synchronism 
and the necessity of a low and easily predictable number of iterations to get the result. 
The addition of a random signal to the inputs, like in the LEGION, is not necessary, 
which is also a factor of complexity reduction. Since the weights are obtained directly 
from the attributes of the input image, there is no need of training, in contrast with 
other neural networks. Einally, the weight determination on the basis of diverse at- 
tributes of the input image, can lead to more sophisticated segmentation procedures. 

The good results obtained with the proposed network through an algorithm that 
simulates its behavior prove the consistency and its several positive characteristics. 
The simulation of a small size network and its implementation in an EPGA chip con- 
firms the correct functioning of the proposal. Also, preliminary statistical analysis 
have showed that the proposed network can segment images with additive Gaussian 
noise and 20dB of signal-to-noise ratio with less than 0.02% of misclassified pixels. 

As there are no references to implementations of LEGION network using mas- 
sively parallel digital hardware, the evaluation of the complexity gain of the proposed 
network is not possible. On the other hand, computational simulations show that the 
proposed network is much faster than the LEGION network version based on differ- 
ential equations. Specific studies for complexity reduction are under development in 
order to implement practical networks using commercial EPGA devices. 
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Based on the results, it is concluded that the proposed oscillatory neural network is 
attractive for applications of image segmentation with implementation in massively 
parallel digital hardware. Its several advantages imply viability for implementation of 
vision chips using digital technology, allowing high speed in image segmentation. 
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Abstract. In this paper the use of Bayes rules and interpolation functions is 
proposed in order to generate three-dimensional artificial neural cells 
incorporating realistic biological neural shapes. A conditional vectorial 
stochastic grammar has been developed to control and facilitate the parallel 
growth of branching. A L-parser (parser to L-Systems) has also been developed 
to guarantee that the grammar is free from mistakes before its use. This parser 
has also the function to generate a group of points corresponding to the 
morphologic structure of a neural cell. These points are visualized in a three- 
dimensional viewer especially developed to show the neural cell generated. 



1 Introduction 

The conventional theory-experiment basis for scientific research has been expanded 
to include computer simulation, which is an increasingly important component of 
many research lines. Indeed, one of the most important uses of computers in 
neuroscience is the simulation of neural systems due to the large amount of data that 
is necessary to be analyzed [1]. Neural modeling has a fundamental role in both 
experimental and theoretical studies to determine the characteristics of nervous 
systems. There is an extraordinarily large number of neural cells in our brain as well 
as a great variety of them. Thus, it is extremely difficult to understand completely 
how the communications between them occur and how a specific area of the brain 
responds to a specific stimulus. In this context, there are several studies in the sense of 
trying to understand the functionality of some neuron types. It is well known that the 
shape of the cells is a fundamental factor to define the communication between them 
and to define their respective functions. Consequently, it has been proved that as more 
complex is the function of the neuron more complex will be its morphology. It is also 
shown that cells can change their shape redirecting their dendritic branching to 
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respond some factors acting on them, such as the death of some neighborhood cells or 
the presence of some attraction/repulsion factor). Therefore, the simulations related to 
the neural growth should take into account the shape and plasticity of the cells. 

Hamilton [2], McCormick and Mulchandani [3], Ascolli and Krichmar [4] have 
published some works related to simulation of the neural growth. These authors 
suggest different methods for the growth of cells considering the shape of the natural 
cells. For instance, Hamilton [2] and Ascolli and Krichmar [3] proposed stochastic L- 
System and simulated the tropism. On the other hand, McCormick and Mulchandani 
[3] proposed the growth with stochastic behavior. However, any of these works take 
into account the historical of the growth or present a easy extension to the growth of 
different cells considering the shape of them in a neural structure. 

The aim of this work is the synthesis of three-dimensional neural cells using 
interpolation functions and Bayes rules incorporated in stochastic graph grammars. 
Thus, the neural development considers the little historical about the growth and the 
shape characteristic that are incorporated by the interpolation functions. These 
functions are used as a way of generalizing several distribution functions obtained 
from extracted measures of natural cells, besides allowing the reduction of the amount 
of stored data. As will be showed, the generated cells present shape very similar to the 
natural cells. Two tools have been built to assist in this simulation: i) L-Parser, a 
parser of the graph grammar generated; ii) Neuron- Viewer, used to show the artificial 
cells. 



2 The Use of Bayes Rule 

The begging of the generation of a cell corresponds to the first level of growing. In 
this situation, in which there is not any previous level to consider, the measures to the 
artificial cells are obtained considering probability density function. However, very 
often the actual state of a cell depends on its previous state of growing (this is valid 
just after the cell ramified at least once). For instance, the diameter of the branch can 
be decreased with its growing. So, it is necessary to verify the previous diameter to 
determinate the actual, once that diameter has to be smaller than the previous one. 
The Bayes rule is used to allow this kind of implementation. First of all, it is 
necessary to obtain a set of shape measures of the natural cells. This work has 
concentrated on rat pyramidal cells (the files were acquired electronically in Canon 
[5]). The developed software to obtain the measures considers computer-acquired 
neuroanatomical files in Eutectic or SWC format [5]. Second, the measures 
considered in the generation of the artificial cells are calculated. The main shape 
measures considered are: number of primary branches, length and width of each 
dendritic segment and arc segment, and branching angles. These measures are 
organized according to the hierarchical level along the tree [6]. The next step is to 
estimate the probability density function and then the cumulative distribution 
functions (CDFs), based on the natural neural cells measures, which characterize the 
morphological properties of the neural cells. Except for the angles, each measure will 
generate a bivariated CDE whose random components correspond to the hierarchical 
level and the respective measure. The angles will generate a trivariated CDFs in 
which random components correspond to two measures (two angles), besides the 
hierarchical level. They are related to the torsion and curvature proposed in 
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McCormick and Mulchandani [3] work to generate three-dimensional cells. Two 
different groups of CDFs have been generated; one corresponding to the apical 
branches and another to the basal branches of the pyramidal cells. Fig. 1 presents two 
examples of CDFs created. 



Leng^ 



(a) (b) 

Fig. 1. Examples of CDFs used in the generation of splines, (a) CDF of length dendritic 
segments by hierarchical level; (b) CDF corresponding of dendritic segment diameters by 
hierarchical level. 

Now, it is necessary to use the Bayes rules to generate CDFs that considers the 
previous measure to calculate the actual measure. According to the Bayes rule, the 
probability of occurrence of a B specific event is affected by the fact of another event 
to have happened or not. Thus, it is necessary to calculate the occurrence of B 
conditioned to the previous occurrence of A, denoted by P(B\A) (probability of B 
given A) [7]. Its expression is given by: 

p,j,\^. nA\B)P(B) _ P{AnB) ( 1 ) 

P(A) P{A) 

The implementation was made considering the dependency of measures between two 
branching levels. Therefore, it was created a CDF of B\A, once that the occurrence of 
B depends on the A. After that, the calculated B values are used to find the C values, 
generating the CDF of C\B, and so on. In Fig. 2 is illustrated some examples of 
generated conditional CDFs. The conditional CDFs have been used to control the 
length, diameter and angles of dendritic segments. For example, the n segment length 
depends on the occurred length in the n-1 segment. In the case of the angles, Bayes 
rules (or conditional CDFs) were used to control the two different angles considered 
in this work: torsion and curvature. Fig. 3 illustrates some examples of the graphic 
resulting of the application of Bayes rules to control these angles. 



3 Obtaining the Interpolations 

The interpolation functions were generated using the CDFs described in the previous 
section and they are applied in all CDFs. In this paper we have not used these 
functions to conditional CDFs, but the implementation can be also extended to 
include the interpolation function with conditional CDFs (this will be implemented in 
future works). There are several methods of interpolation functions. In this work we 
used Thin Plate Splines (TPS), which is a linear combination of radial basis functions. 
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In a TPS, the interpolating surface presents minimal deformation energy related to the 
known charge points, originating a smooth and continuous surface [8,9]. 





Fig. 2. Examples of conditional CDFs used in the generation of splines 





Fig. 3. Examples of conditional CDFs used to control the angles of dendritic segment. 



Two groups of information are necessary for this interpolation: those known (in 
this case, the measures of the neural cells) and those we want to know (in this case, 
the measures that will he used in the generation of the cells in agreement with the 
probabilities and the level of each branch). The first group of information can be 

represented by a set of n vectors V = {VjjVjv-jF,, }, where V, = (C[,C 2 ,...,C^) , 
1 < Z < n is a d-dimensional vector, and n is the number of samples of a given 
measure. Once chosen V, we have a set F (V) = {f (v^), f (V2 ),.■■)/ (f„)} , where 
/(V) is a result of application on V . We can describe the function interpolation as: 

-* —* 7 ’ ” i'2^ 

F(v) = a ■v-fy?-H^w,0(]|v-v,.||) 

1=1 

where 0(x) corresponds to basic radial basis function, w, is the contribution of the 
function 0(x) in each distance x, is a constant term corresponding to translation of 
the fZ^x) to fit it in the interpolation, and oris a vector used to the rotation of 0(x) in 
each dimension. The function 0(x) used in this work is defined by: 

[0 ifx = 0 ( 3 ) 

fl>(^) = 2 

[x log(x) otherwise 

By the equation (1) and using the known value of V and F(V) , we can calculate 
a, P, and w. After then, we use the equation (4) together with the calculated variables 
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in (2) to estimate the unknown values of F{x) , being X the values of the measure 
for which we want to estimate F{x) . 



The axes were transposed to facilitate the implementation, such that, being 
provided the hierarchical level and a random number (between 0 and 1), the function 
returns a measure. Therefore, F(y) in the equation (2) corresponds to a measure (for 
example, diameter), Cj to the probability (CDF) and to the hierarchical level. In the 
equation (4), a: is a vector with Xj representing the probability and x^, the hierarchical 
level. 

4 Generating Neural Cells 

To allow the parallel growth, it was created a stochastic graph grammar (a variation of 
L-Systems) that incorporates the interpolation functions and Bayes rules (or 
conditional CFDs). Some measures are controlled by interpolation functions (for 
example, length of dendritic segment and number of branches) and others, by Bayes 
rules (for example, angles and diameter). They are used to express variation on the 
growth pattern in terms of the current interaction of the generation process or external 
influences such as chemical markers or neurotrophic factors. The statistical functions 
incorporate the precise statistical characterization of the morphological features of the 
neurons, which can also take into account the stage of the generating process 
(previous and current stages). 

A parser of L-system (L-parser) has also been developed in this work to compile 
and interpret the grammar. It allows the user to enter with the grammar (L-system) 
and the result is the object obtained by this grammar. It is very important to eliminate 
mistakes in the creation of the grammar before its use. This parser facilitates the 
definition of a specific grammar for each type of cell when is necessary to generate 
different types of cells in the same neural structure. 

The probabilities of each grammar rule have been defined by the probability 
functions obtained from the measures extracted from biological cells. This means that 
there is not a fixed probability for each rule production, as commonly found in works 
that use stochastic L-systems. On the contrary, it is defined a probability function that 
allows varying a probability depending on the branching level of the branch. The 
statistical functions used were the CDFs described in section 2, which were 
incorporated in the grammar using polynomial approximation described in section 3. 

It must be emphasized that the grammar results are strongly influenced by the 
interpolation functions that are related to the L-systems. These functions are used to 
avoid the necessity to work with several data files, once they generalize a specific 
measure for all the levels of branching of the cell. So, it is not necessary to have a file 
of measures for each level, but just the function that describes these measures as a 
whole. 

The grammar below represents a summarized model of our proposal, which 
includes some control actions (lines 2, 3, 4, 6, 7, 7.1.1, 7.1.2 and 7.2.1). The execution 
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of a graphic operation is conditioned to the logic value of the expression (lines 6.1, 

7.1 and 7.2). In line 4 the ChooseSegmentLength (Level) chooses the dendritic 
segment length. This function uses the interpolation polynomial described previously 
referent to the dendritic segment length of each branching level. 



Grammar : 

1 . Axiom : X; 

2. Level := 1 ; 

3 SumCp : = 0 ; 

4. SdLength := ChooseSegmentLength (Level); 

c y tp Y 

■' ^ LengthCp (Level ) Curvature (Level ) Torsion (Level ) Diameter (Level ) 

6. SumCp := SumCp+LengthCp (Level) ; 

6.1 [SumCp < LengthSd] : Y->X 

7. Bprob : =ChooseBranch (Level ) ; 

7.1 [Bprob < BranchProb (Level)] : Y [-X] [+X] 

7.1.1. Level := Level + 1; 

7 . 1 . 2 . LengthSd =ChooseSegmentLength (Level) ; 

7.2 [Bprob > BranchProb (Level)] : Y NULL 

7.2.1 Level := Level -1; 



Since the execution of its actions is not only depending on probabilistic functions, 
we call this model as Conditional and Stochastic Grammar. 

In line 5, X is responsible for drawing a dendritic arc. The LenghtCp(Level), 
Torsion(Level), Curvature(Level) and Diameter(Level) functions are associated with 
the F symbol to give information about direction, length and thickness of the arc to be 
drawn. 

The dendritic branching process starts in level 1 with the X axiom and the choice 
of the dendritic segment length (line 4). Note that there are several CDFs and thin 
plate splines associated with F in line 5, such as arc length, torsion angle, curvature 
angle and diameter. While the sum of arcs does not reach the chosen length segment 
(line 6.1), the branch continues to grow [10]. After this length is reached, it is verified 
if the branch will ramify or not (line 7). BranchProb(level), in line 7.1, verifies the 
corresponding value of branching probability in this level. If Bprob is smaller than the 
branching probability, then two new branches will start to grow starting, thus, a new 
dendritic segment level. Otherwise, no more new branching will appear in that branch 
and it will stop growing (line 7.2). The process continues recursively until all the 
conditions are satisfied (the entire branching stop growing). 

Each generated neuron is different from any other previously produced because of 
the used statistical functions, but all cells present the same general characteristics 
related to neural shape. 

Each neuronal dendrite is described as a series of small cylindrical compartments. 
Thus, these small compartments will express the tortuosity of the branches. Fig. 4 
illustrates some examples of neural cells generated by the proposed method. 

An environment of three-dimensional visualization has been generated using 
OpenGL tool. This environment allows the volumetric visualization, including 
illumination, transparency, texture, rotations, translations and changes of scale of the 
generated neural structures. It also includes a menu of options to the variation of 
visualization parameters such as color, light intensity, properties of the light and of 
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Fig. 4. Neural cells generated by the proposed methods. 



the material of the object (in the case, of the neurons), transparency, wire frame 
visualization, etc, besides being able to include textures. One example of this 
environment can be seen in Fig. 5. It also allows defining how much percentage of a 
neuron will be drawn (100%, 90%, etc.). Fig. 6 shows some examples of this option. 
In those examples it is possible to see that the branches are growing in parallel. 



Fig. 5. Environment to visualize three-dimensional cells. 



•% 



m, 



■(,' f 




Fig. 6. Representation of three-dimensional cells drawn in different percentages. 



5 Conclusion 

The study of cells showed in this paper can provide important support to guide the 
simulation of more complex cells in the future, like Purkinjie cell. 

It was shown in this work that the inclusion of Bayes rules in L-system allows that 
the generation of the current branch takes into account parameters of the previous 
stages. In addition, these rules allow relating two or more measures. For instance, we 
can consider that the diameter of the branch depends on its length. Therefore, it is 
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possible to verify if there is some relationship among the measures or hierarchical 
levels. It is important to emphasize that this type of conditional statistics is already 
considered for the angles, once we have to work with two angles hy the fact of we are 
generating three-dimensional cells. In this way, first of all we select a hierarchical 
level and a random number. Then, using the polynomial approximation of CDF, we 
can calculate an angle (curvature) and, given the curvature, we used Bayes to 
calculate the torsion. Additional studies have been initiated about the dynamical 
growth of the cell including the trophic fields corresponding to varying concentration 
of ions, chemoattractiors or electric fields. 
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Abstract. The success of evolutionary algorithms, in particular Factor- 
ized Distribution Algorithms (FDA), for many pattern recognition tasks 
heavily depends on our ability to reduce the number of function evalua- 
tions. 

This paper introduces a method to reduce the population size overhead. 
We use low order marginals during the learning step and then compute 
the maximum entropy joint distributions for the cliques of the graph. The 
maximum entropy distribution is computed by an Iterative Proportional 
Fitting embedded in a junction tree message passing scheme to ensure 
consistency. 

We show for the class of single connected FDA that our method outper- 
forms the commonly-used PLS sampling. 



1 Introduction 

In recent years, evolutionary algorithms (EA) have been successfully applied to 
a wide range of problems in the field of pattern recognition. The critical issue 
in this application of EA is to reduce as far as possible the number of fitness 
function evaluations which depends directly of the population size of the EA. 

In this paper we introduce a method which helps in reducing the popula- 
tion size for a particular class of EA: the Estimation Distribution Algorithms 
(EDA) [14], which can be considered as a substantial improvement of the genetic 
algorithm paradigm [4]. 

The tractable subclass of EDA, the so-called Factorized Distribution Algo- 
rithms (FDA), learn factorizations of the joint distribution, which are trees, 
polytrees or general directed acyclic graphs. This information is used to con- 
struct a model from which new points are efficiently sampled. FDA algorithms 
use results of Graphical Models research [13]. 

A critical parameter both for learning and sampling is the required popu- 
lation size which grows exponentially with the size of the cliques of the graph. 
This paper introduces a method to reduce the population size overhead. We use 
low order marginals during the learning step and then compute the maximum 
entropy joint distributions for the cliques marginals of the graph. The maximum 
entropy distribution is computed by an Iterative Proportional Fitting embedded 
in a junction tree message passing scheme to ensure consistency. 



A. Sanfeliu and J. Ruiz-Shulcloper (Eds.): CIARP 2003, LNCS 2905, pp. 683—690, 2003. 
(c) Springer- Verlag Berlin Heidelberg 2003 



684 



A. Ochoa et al. 



The outline of the paper is as follows: Section 2 gives a short introduction on 
Graphical Models and Factorized Distribution Algorithms. Section 3 presents a 
single connected FDA. In the next section, we introduce our maximum entropy 
sampling. Then, we present our test bed and discuss the numerical results. Fi- 
nally, the main conclusions of our research are given. 

2 Background 

2.1 Bayesian Networks 

A Bayesian network is a directed acyclic graph (DAG) containing nodes, repre- 
senting the variables, and arcs, representing probabilistic dependencies among 
nodes. In this paper we will consider binary variables, but the results can be 
extended to the general discrete case. 

Let X = {Al, ..., ,X„} denote the set of random variables. For any node Xi 
and set of parents ttx^ the Bayesian network specifies a conditional probability 
distribution p{xi \ We use lower cases to represent the variable values. 

In general, Bayesian networks can be multiple connected. In this paper we 
deal with single connected graphs: these are graphs where no more than one 
(undirected) path connects every two variables. Examples are chains, trees, 
forests and polytrees. Whereas in trees each edge is directed away from the 
root node (so each node has only one parent), in polytrees the direction of edges 
is not restricted. A polytree generally has many roots (nodes without parents), 
whereas a tree has only one root. 

Polytrees retain many of the computational advantages of trees, but they 
allow us to describe higher-order interactions than trees, because they allow 
head to head patterns X ^ Z Y . This type of pattern makes the parents 

X and Y conditionally dependent given Z, which can not be represented by 
a tree. A polytree structure can be induced by second-order marginals using a 
maximum weight spanning tree algorithm, similar to [1]. 

Given the structure of the probability distribution defined by the Bayesian 
network, the problem is to find a factorization defining this distribution. This 
factorization can be determined using a concept called junction tree. 

2.2 Junction Trees 

A junction tree [11,9] is an undirected tree the nodes of which are clusters of 
variables. The clusters satisfy the junction property. For any two clusters V and 
W and any cluster U on the unique path between V and W in the junction tree 
P n LF C {7. The edges between the clusters are labeled with the intersection of 
the adjacent clusters; we call these labels separating sets or separators. 

Junction trees are a very powerful tool for inference in Bayesian networks. 
For construction of a junction tree, given a general network, we refer to [11,9]. 
Given a polytree, a junction tree is simple to construct: For each variable that 
is not a root, create a node containing this variable and all its parents. The 
separators between the nodes always consist of only one variable. 
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Algorithm 1 FDA 

Step 0 Set t ■«— 1. Generate A 3> 0 points randomly. 

Step 1 Select M ^ N points according to a selection method. 

Step 2 Learn a bayesian factorization of the selected set: 

n 

p‘{xi, ■■■ ,Xn) = I Xil,Xi2, ■■■,Xir) 

i=l 

Step 3 Sample N new points according to the distribution 

p{x,t + l) = p^ixi, ■ ■ ■ ,Xn) 

Step 4 Set t ■«— t + 1. If termination criteria are not met, go to Step 1. 



2.3 The Factorized Distribution Algorihtms 

Generally, in an FDA (see algorithm 1) the estimation (step 2) of the probability 
factorization of the best individuals is used to sample (step 3) the points of the 
next generation, there are no mutation nor crossover operators. 

The computational cost of an FDA implementation is determined by the 
number of function evaluations, the memory needed to store, and the time spent 
to update and sample the probabilistic model. This time is often exponential in 
the maximum number of variables that interact in the problem, or which is the 
same, the size of the building blocks. FDA algorithms which use only pairwise 
dependencies are cheap. 

3 PADA2 — FDA Algorithm with Pairwise Independences 



The Polytree Approximation Distribution Algorithm (PADA) [17,16] is a spe- 
cialization of FDA (see algorithm 1) for single connected Bayesian networks. In 
this paper we use PADA2 [16], which works with second order marginal distribu- 
tions. PADA2 is inspired by the algorithm proposed by Rebane and Pearl [15]. 
We shortly review this algorithm. 

A polytree with n variables has a maximum of n — 1 edges, otherwise it would 
not be single connected. PADA2 chooses the edges that have the largest values 
for the mutual information H {X) + H {Y) — H (A, Y) [2]. The selection of the 
edges is done by a greedy Maximum Weight Spanning Tree algorithm. 

Once we have constructed the skeleton a procedure tries to direct the edges 
of the skeleton by using the following scheme: \i X — Z — Y £ skeleton, then 
whenever H (A) -|- H (F) = H (A, Y) we orient the edges to Z. All other edges 
are directed at random without introducing new head to head connections. 

Another distinguishing feature of PADA2 concerns the sampling step 3 (see 
algorithm 1). To the best of our knowledge, all FDA algorithms introduced so 
far that are based on Bayesian networks use the same Monte Carlo sampling 
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Fig. 1. A polytree learned by PADA2 and its junction tree. 



algorithm, namely Probabilistic Logic Sampling (PLS) [6]. It is very simple. 
Given an ancestral ordering of all the variables (parents before children), the 
method samples Xi using p{Xi \ ttxJ. One obvious problem of this method is 
that the number of points required for estimating the conditional probabilities 
correctly is exponential in the number of parents. 

Both in the process of learning and sampling, FDA that learn general 
Bayesian networks need a population size which is exponential in the number 
of parents in order to get reliable estimates of the conditional probabilities. 
PADA2 is in a different situation: its learning algorithm deals only with second 
order marginals. Figure 1 shows a poly tree learned by PADA2. However, note 
that the resulting junction tree contains a clique with four variables (7 and its 
parents 2, 3, 4) and two cliques with 3 variables, so the PLS sampling requires 
4-order and 3-order marginals. Therefore, PADA2 is in a very singular situation 
when it uses PLS: what is gained during learning is then lost during the sampling 
step. 

In the next section we will present a novel method to overcome this problem. 
We fix some of the second order marginals used in the learning step, and then 
compute higher order marginals (like the ones in Fig. 1) as the maximum entropy 
distributions that obey the given second order marginals. It is important to note 
that, in contrast to PLS, the computation of these marginals does not need a 
larger sample size (population size) than the one used for learning. 

4 Maximum Entropy 

4.1 Entropy and the Maximum Entropy Principle 

The Entropy [2] of a probability distribution for a random variable X is given 

by 



p{x)logp{x) . (1) 

X 



If p{x) = 0, then logp(x) is not defined. For this case, we set OlogO = 0. 
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The entropy is a measure of the disorder in the distribution, or of our uncer- 
tainty about the outcome of a random experiment. 

The Maximum Entropy Principle states that, supposing we are looking for a 
probability distribution fulfilling some given contraints, we should choose among 
the possible solutions the one with the highest entropy. This is historically 
founded on Bernoulli’s “principle of insufficient reason” . It was introduced and 
advocated by Jaynes [7,8]. For a motivation and discussion of Maximum Entropy, 
see [5]. 

4.2 Maximum Entropy Sampling in PADA2 

The method is a variation of previous methods for learning probability distribu- 
tions on junction trees [10,12]. 

First we construct a junction tree from the polytree, as described in Sect. 
2.2. On each node of the junction tree, we maintain a probability distribution of 
the contained variables (remember, a child and its parents). 

Then we find a closed tour through the junction tree that visits each node 
at least once. On this tour we perform two steps: 

1 . Calculate the local distribution by iterative proportional fitting, 

2. Pass a message to the next cluster on the tour, in order to ensure consistency 
between the clusters. 



4.3 Iterative Proportional Fitting 

The local iteration consists of finding the maximum entropy distribution of a 
child and its parents, given the second order marginals. This is done by Iter- 
ative Proportional Fitting. IFF computes iteratively a distribution qr(x) from 
the given marginals pfe(xfc), k = 1, . . . ,K, where x^ is a subvector of x and 
r = 0,1,2,... is the iteration index. Let n be the dimension of x and dk be 
the dimension of x^. Then, starting from the uniform distribution, the update 
formula is 

/ \ / \ PkiP^k) 

= ^ ( 2 ) 

with k = ((r — 1) mod K) -\- 1. 

For the proof that IFF converges to the maximum entropy solution, see [3] 
and references therein. Note that the effort is exponential in the clique size. 



4.4 Message Passing 

A message from a cluster IF to a cluster V, separated by S' = F fl W, is sent by 
the following algorithm: 



g^(xs,xw\s) 

^W\S 






(x) 



gg>d(x|S) 



Here x|S denotes the vector x, restricted to the variables in S. 
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4.5 Sampling in the Junction Tree 

Using the distributions within the junction tree, the sampling of points works as 
follows: 

Start from any node in the junction tree, sample values for the variables from 
the local probability distribution. Then, proceed to the neighbors and sample 
values for the new variables, conditioned on the variables which have already 
been sampled. When each node has been visited, the sampled individual is com- 
plete. 

For example, from the structure in Fig. 1, this algorithm samples using the 
factorization 



Pjt(x) = p { xi , X 2 , X 5 ) p { x 3 , X 4 , xr \ x2 ) p { xe , Xs ,\ xj ) , 
whereas PLS uses the factorization 

PPLs(x) = p{Xi)p{x2)p{Xz)p{Xi)p{x;i\Xi,X2)p{xQ)p{X7\X2, X 3 , Xi)p{x%\X(i, X^) 

which is not an exact factorization of the underlying distribution. 



5 Numerical Results 

Now we present the set of additive decomposable functions (ADF) that will be 
used in our experiments. 

1. The Deceptive Function of order k, is defined as follows, u denotes the 

number of Is in the string. We set = k\iu= k, and = k—l—u 

otherwise. The function F^^^ is a separable function of subset size k, with 
n = k * 1 . 



i 

Ft^=J2ft%^kr-k+l + ...+XM) 

i=l 



2. The next function is also a separable ADF with blocks of length 5. In each 
block the FirstPolytreeB function is evaluated. This function has the fol- 
lowing property: Its Boltzmann distribution with parameter (3 k. 2 has a 
polytree structure with edges Xi — >■ 0 : 3 , X 2 — >■ 2 : 3 , 2:3 — >■ x^ and X 4 — >■ x^. The 
reader can easily check this by constructing the Boltzmann distribution and 
then checking marginal dependencies. The definition of the function is given 
below. 



X /5^°‘"(x) 

00000 -1.141 

00001 1.334 

00010 -5.353 

00011 -1.700 

00100 0.063 

00101 -0.815 

00110 -0.952 

00111 -0.652 



X /r°‘^(x) 

01000 -0.753 

01001 1.723 

01010 -4.964 

01011 -1.311 

01100 1.454 

01101 0.576 
OHIO 0.439 
01111 0.739 



X /5^°^"(x) 

10000 -3.527 

10001 -1.051 

10010 -7.738 

10011 -4.085 
10100 1.002 
10101 0.124 

10110 -0.013 

10111 0.286 



X /5^°‘"(x) 

11000 -6.664 

11001 -4.189 

11010 -10.876 

11011 -7.223 

11100 -1.133 

11101 - 2.011 

11110 -2.148 

11111 -1.849 
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We recall that the basic claim of our research is that our maximum entropy 
approach to sampling requires a smaller population size than PLS. In this section 
we will compare these two sampling methods for PADA2. 

All the experiments use a fixed truncation selection pressure (r = 0.3), do not 
use elitism and are run until a maximum of 20 generations. We perfom 100 runs 
for each experiment. We use as test functions Deceptive 4 (with 20 variables), 
Goldberg Deceptive 3 (21 variables) and the FirstPolytreeS (20 variables). 

As can be seen in Table 1, the improvement in comparison with conventional 
PLS is enormous. E. g. for Deceptive 4, our new method finds the optimum in 
93 % of the cases for only 800 individuals, whereas PLS even with a population 
size of 5000 succeeds only in 64 %. 

It is also remarkable that the number of generations until success stays the 
same or even improves. It has also stabilized, as can be seen from the decrease 
in the standard deviation. 



Table 1. Numerical results. D4 - Deceptive 4, D3 - Goldberg Deceptive 3, FP5 - First- 
Polytree 5. N - population size, %S - Success rate, Gc - generation where the optimum 
is found, MES - maximum entropy sampling, PLS - probabilistic logic sampling. 



N 


200 


600 


800 


5000 


D4 


PLS 


%S 


1 


12 


16 


64 


Gc 


5 ±0.0 


8.0 ±3.9 


8.3 ±3.5 


9.23 ±3.5 


MES 


%S 


21 


76 


93 


100 


Gc 


11.14 ±4.5 


8.6 ±3.2 


8.4 ± 2.5 


6.1 ±1.3 


D3 


PLS 


%S 


0 


8 


10 


92 


Gc 


- 


9.75 ± 1.5 


8.7 ±3.2 


7.21 ± 1.2 


MES 


%S 


2 


69 


90 


100 


Gc 


8.5 ±0.7 


7.4 ± 1.1 


7.0 ± 1.2 


5.84 ±0.9 


FPb 


PLS 


%S 


25 


50 


54 


55 


Gc 


10.08 ±2.08 


10.42 ±2.59 


10.59 ±2.34 


10.8 ± 1.5 


MES 


%S 


59 


100 


100 


100 


Gc 


5.14 ± 1.07 


3.93 ±0.7 


3.66 ±0.59 


2.92 ±0.44 



6 Summary and Conclusions 

The paper introduces a new method for sampling individuals in EDA. Here 
we restrict ourselves to single connected Bayesian networks (polytrees). In a 
forthcoming paper, we will discuss the multiple connected case. 

The polytree induces canonically a junction tree. Its nodes contain the higher- 
order marginal distributions that are needed in the sampling phase. These are 
computed from the given second order marginals using the maximum entropy 
principle. The conventional “Probabilistic Logic Sampling” is replaced by sam- 
pling inside the junction tree. 
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We explore the method by applying it on three benchmark problems. The im- 
provement in comparison with the previous method turns out to be tremendous. 
We conclude that using this sampling, we can greatly reduce the population 
size. This results in a big saving of function evaluations which is critical for any 
pattern recognition application of evolutionary computation. 
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